# HG changeset patch
# User peterjc
# Date 1375184026 14400
# Node ID 688f3fb09a6a737631350753cf4bfb2a2434b026
# Parent c1a6e5aefee017275a01211b8b66a0625c40e68e
Uploaded v0.0.20 preview 11, moved to GitHub, MIT license, reST markup.
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/README.rst Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,166 @@
+Galaxy wrappers for NCBI BLAST+ suite
+=====================================
+
+These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+),
+and does not work with the NCBI 'legacy' BLAST suite (e.g. blastall).
+
+Note that these wrappers (and the associated datatypes) were originally
+distributed as part of the main Galaxy repository, but as of August 2012
+moved to the Galaxy Tool Shed as 'ncbi_blast_plus' (and 'blast_datatypes').
+My thanks to Dannon Baker from the Galaxy development team for his assistance
+with this.
+
+These wrappers are available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
+Automated Installation
+======================
+
+Galaxy should be able to automatically install the dependencies, i.e. the
+'blast_datatypes' repository which defines the BLAST XML file format
+('blastxml') and protein and nucleotide BLAST databases ('blastdbp' and
+'blastdbn').
+
+You must tell Galaxy about any system level BLAST databases using configuration
+files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein
+databases like NR), and blastdb_d.loc (protein domain databases like CDD or
+SMART) which are located in the tool-data/ folder. Sample files are included
+which explain the tab-based format to use.
+
+You can download the NCBI provided databases as tar-balls from here:
+
+* ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR)
+* ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD)
+
+
+Manual Installation
+===================
+
+For those not using Galaxy's automated installation from the Tool Shed, put
+the XML and Python files in the tools/ncbi_blast_plus/ folder and add the XML
+files to your tool_conf.xml as normal (and do the same in tool_conf.xml.sample
+in order to run the unit tests). For example, use::
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+You will also need to install 'blast_datatypes' from the Tool Shed. This
+defines the BLAST XML file format ('blastxml') and protein and nucleotide
+BLAST databases composite file formats ('blastdbp' and 'blastdbn').
+
+As described above for an automated installation, you must also tell Galaxy
+about any system level BLAST databases using the tool-data/blastdb*.loc files.
+
+You must install the NCBI BLAST+ standalone tools somewhere on the system
+path. Currently the unit tests are written using "BLAST 2.2.26+".
+
+Run the functional tests (adjusting the section identifier to match your
+tool_conf.xml.sample file)::
+
+ ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.11 - Final revision as part of the Galaxy main repository, and the
+ first release via the Tool Shed
+v0.0.12 - Implements genetic code option for translation searches.
+ - Changes to 1000 sequences at a time (to cope with
+ very large sets of queries where BLAST+ can become memory hungry)
+ - Include warning that BLAST+ with subject FASTA gives pairwise
+ e-values
+v0.0.13 - Use the new error handling options in Galaxy (the previously
+ bundled hide_stderr.py script is no longer needed).
+v0.0.14 - Support for makeblastdb and blastdbinfo with local BLAST databases
+ in the history (using work from Edward Kirton), requires v0.0.14
+ of the 'blast_datatypes' repository from the Tool Shed.
+v0.0.15 - Stronger warning in help text against searching against subject
+ FASTA files (better looking e-values than you might be expecting).
+v0.0.16 - Added repository_dependencies.xml for automates installation of the
+ 'blast_datatypes' repository from the Tool Shed.
+v0.0.17 - The BLAST+ search tools now default to extended tabular output
+ (all too often our users where having to re-run searches just to
+ get one of the missing columns like query or subject length)
+v0.0.18 - Defensive quoting of filenames in case of spaces (where possible,
+ BLAST+ handling of some mult-file arguments is problematic).
+v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc
+ for the domain databases they use (e.g. CDD, PFAM or SMART).
+ - Correct case of exception regular expression (for error handling
+ fall-back in case the return code is not set properly).
+ - Clearer naming of output files.
+v0.0.20 - Added unit tests for BLASTN and TBLASTX.
+ - Added percentage identity option to BLASTN.
+ - Fallback on ElementTree if cElementTree missing in XML to tabular.
+ - Link to Tool Shed added to help text and this documentation.
+ - Tweak dependency on blast_datatypes to also work on Test Tool Shed
+ - Adopted standard MIT License.
+ - Development moved to GitHub, https://github.com/peterjc/galaxy_blast
+======= ======================================================================
+
+
+Bug Reports
+===========
+
+You can file an issue here https://github.com/peterjc/galaxy_blast/issues or ask
+us on the Galaxy development list http://lists.bx.psu.edu/listinfo/galaxy-dev
+
+
+Developers
+==========
+
+This script and related tools were originally developed on the 'tools' branch
+of the following Mercurial repository:
+https://bitbucket.org/peterjc/galaxy-central/
+
+As of July 2013, development is continuing on a dedicated GitHub repository:
+https://github.com/peterjc/galaxy_blast
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use
+the following command from the GitHub repository root folder::
+
+ $ ./ncbi_blast_plus/make_ncbi_blast_plus.sh
+
+This simplifies ensuring a consistent set of files is bundled each time,
+including all the relevant test files.
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/blastxml_to_tabular.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/blastxml_to_tabular.py Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,261 @@
+#!/usr/bin/env python
+"""Convert a BLAST XML file to tabular output.
+
+Takes three command line options, input BLAST XML filename, output tabular
+BLAST filename, output format (std for standard 12 columns, or ext for the
+extended 24 columns offered in the BLAST+ wrappers).
+
+The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
+qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
+mean:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The additional columns offered in the Galaxy BLAST+ wrappers are:
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+Most of these fields are given explicitly in the XML file, others some like
+the percentage identity and the number of gap openings must be calculated.
+
+Be aware that the sequence in the extended tabular output or XML direct from
+BLAST+ may or may not use XXXX masking on regions of low complexity. This
+can throw the off the calculation of percentage identity and gap openings.
+[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
+with these numbers changing depending on whether or not the low complexity
+filter is used.]
+
+This script attempts to produce identical output to what BLAST+ would have done.
+However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
+space character (probably a bug).
+"""
+import sys
+import re
+
+if "-v" in sys.argv or "--version" in sys.argv:
+ print "v0.0.12"
+ sys.exit(0)
+
+if sys.version_info[:2] >= ( 2, 5 ):
+ try:
+ from xml.etree import cElementTree as ElementTree
+ except ImportError:
+ from xml.etree import ElementTree as ElementTree
+else:
+ from galaxy import eggs
+ import pkg_resources; pkg_resources.require( "elementtree" )
+ from elementtree import ElementTree
+
+def stop_err( msg ):
+ sys.stderr.write("%s\n" % msg)
+ sys.exit(1)
+
+#Parse Command Line
+try:
+ in_file, out_file, out_fmt = sys.argv[1:]
+except:
+ stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")
+
+if out_fmt == "std":
+ extended = False
+elif out_fmt == "x22":
+ stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")
+elif out_fmt == "ext":
+ extended = True
+else:
+ stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")
+
+
+# get an iterable
+try:
+ context = ElementTree.iterparse(in_file, events=("start", "end"))
+except:
+ stop_err("Invalid data format.")
+# turn it into an iterator
+context = iter(context)
+# get the root element
+try:
+ event, root = context.next()
+except:
+ stop_err( "Invalid data format." )
+
+
+re_default_query_id = re.compile("^Query_\d+$")
+assert re_default_query_id.match("Query_101")
+assert not re_default_query_id.match("Query_101a")
+assert not re_default_query_id.match("MyQuery_101")
+re_default_subject_id = re.compile("^Subject_\d+$")
+assert re_default_subject_id.match("Subject_1")
+assert not re_default_subject_id.match("Subject_")
+assert not re_default_subject_id.match("Subject_12a")
+assert not re_default_subject_id.match("TheSubject_1")
+
+
+outfile = open(out_file, 'w')
+blast_program = None
+for event, elem in context:
+ if event == "end" and elem.tag == "BlastOutput_program":
+ blast_program = elem.text
+ # for every tag
+ if event == "end" and elem.tag == "Iteration":
+ #Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
+ # sp|Q9BS26|ERP44_HUMAN
+ # Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ # 406
+ #
+ #
+ #Or, from BLAST 2.2.24+ run online
+ # Query_1
+ # Sample
+ # 516
+ # ...
+ qseqid = elem.findtext("Iteration_query-ID")
+ if re_default_query_id.match(qseqid):
+ #Place holder ID, take the first word of the query definition
+ qseqid = elem.findtext("Iteration_query-def").split(None,1)[0]
+ qlen = int(elem.findtext("Iteration_query-len"))
+
+ # for every within
+ for hit in elem.findall("Iteration_hits/Hit"):
+ #Expecting either this,
+ # gi|3024260|sp|P56514.1|OPSD_BUFBU
+ # RecName: Full=Rhodopsin
+ # P56514
+ #or,
+ # Subject_1
+ # gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]
+ # Subject_1
+ #
+ #apparently depending on the parse_deflines switch
+ sseqid = hit.findtext("Hit_id").split(None,1)[0]
+ hit_def = sseqid + " " + hit.findtext("Hit_def")
+ if re_default_subject_id.match(sseqid) \
+ and sseqid == hit.findtext("Hit_accession"):
+ #Place holder ID, take the first word of the subject definition
+ hit_def = hit.findtext("Hit_def")
+ sseqid = hit_def.split(None,1)[0]
+ # for every within
+ for hsp in hit.findall("Hit_hsps/Hsp"):
+ nident = hsp.findtext("Hsp_identity")
+ length = hsp.findtext("Hsp_align-len")
+ pident = "%0.2f" % (100*float(nident)/float(length))
+
+ q_seq = hsp.findtext("Hsp_qseq")
+ h_seq = hsp.findtext("Hsp_hseq")
+ m_seq = hsp.findtext("Hsp_midline")
+ assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
+ gapopen = str(len(q_seq.replace('-', ' ').split())-1 + \
+ len(h_seq.replace('-', ' ').split())-1)
+
+ mismatch = m_seq.count(' ') + m_seq.count('+') \
+ - q_seq.count('-') - h_seq.count('-')
+ #TODO - Remove this alternative mismatch calculation and test
+ #once satisifed there are no problems
+ expected_mismatch = len(q_seq) \
+ - sum(1 for q,h in zip(q_seq, h_seq) \
+ if q == h or q == "-" or h == "-")
+ xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")
+ if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):
+ stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \
+ % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),
+ int(mismatch), expected_mismatch))
+
+ #TODO - Remove this alternative identity calculation and test
+ #once satisifed there are no problems
+ expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)
+ if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):
+ stop_err("%s vs %s identities, expected %i <= %i <= %i" \
+ % (qseqid, sseqid, expected_identity, int(nident),
+ expected_identity + q_seq.count("X")))
+
+
+ evalue = hsp.findtext("Hsp_evalue")
+ if evalue == "0":
+ evalue = "0.0"
+ else:
+ evalue = "%0.0e" % float(evalue)
+
+ bitscore = float(hsp.findtext("Hsp_bit-score"))
+ if bitscore < 100:
+ #Seems to show one decimal place for lower scores
+ bitscore = "%0.1f" % bitscore
+ else:
+ #Note BLAST does not round to nearest int, it truncates
+ bitscore = "%i" % bitscore
+
+ values = [qseqid,
+ sseqid,
+ pident,
+ length, #hsp.findtext("Hsp_align-len")
+ str(mismatch),
+ gapopen,
+ hsp.findtext("Hsp_query-from"), #qstart,
+ hsp.findtext("Hsp_query-to"), #qend,
+ hsp.findtext("Hsp_hit-from"), #sstart,
+ hsp.findtext("Hsp_hit-to"), #send,
+ evalue, #hsp.findtext("Hsp_evalue") in scientific notation
+ bitscore, #hsp.findtext("Hsp_bit-score") rounded
+ ]
+
+ if extended:
+ sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))
+ #print hit_def, "-->", sallseqid
+ positive = hsp.findtext("Hsp_positive")
+ ppos = "%0.2f" % (100*float(positive)/float(length))
+ qframe = hsp.findtext("Hsp_query-frame")
+ sframe = hsp.findtext("Hsp_hit-frame")
+ if blast_program == "blastp":
+ #Probably a bug in BLASTP that they use 0 or 1 depending on format
+ if qframe == "0": qframe = "1"
+ if sframe == "0": sframe = "1"
+ slen = int(hit.findtext("Hit_len"))
+ values.extend([sallseqid,
+ hsp.findtext("Hsp_score"), #score,
+ nident,
+ positive,
+ hsp.findtext("Hsp_gaps"), #gaps,
+ ppos,
+ qframe,
+ sframe,
+ #NOTE - for blastp, XML shows original seq, tabular uses XXX masking
+ q_seq,
+ h_seq,
+ str(qlen),
+ str(slen),
+ ])
+ #print "\t".join(values)
+ outfile.write("\t".join(values) + "\n")
+ # prevents ElementTree from growing large datastructure
+ root.clear()
+ elem.clear()
+outfile.close()
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/blastxml_to_tabular.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/blastxml_to_tabular.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,137 @@
+
+ Convert BLAST XML output to tabular
+ blastxml_to_tabular.py --version
+
+ blastxml_to_tabular.py $blastxml_file $tabular_file $out_format
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of
+formats including tabular and a more detailed XML format. A complex workflow
+may need both the XML and the tabular output - but running BLAST twice is
+slow and wasteful.
+
+This tool takes the BLAST XML output and can convert it into the
+standard 12 column tabular equivalent:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 22 column tabular
+BLAST output. This tool now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+Beware that the XML file (and thus the conversion) and the tabular output
+direct from BLAST+ may differ in the presence of XXXX masking on regions
+low complexity (columns 21 and 22), and thus also calculated figures like
+the percentage identity (column 3).
+
+**References**
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastdbcmd_info.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_blastdbcmd_info.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,67 @@
+
+ Show BLAST database information from blastdbcmd
+
+ blastdbcmd
+ blast+
+
+ blastdbcmd -version
+
+blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" -info -out "$info"
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Calls the NCBI BLAST+ blastdbcmd command line tool with the -info
+switch to give summary information about a BLAST database, such as
+the size (number of sequences and total length) and date.
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,139 @@
+
+ Extract sequence(s) from BLAST database
+
+ blastdbcmd
+ blast+
+
+ blastdbcmd -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}"
+
+##TODO: What about -ctrl_a and -target_only as advanced options?
+
+#if $id_opts.id_type=="file":
+-entry_batch "$id_opts.entries"
+#else:
+##Perform some simple search/replaces to remove whitespace
+##and make it comma separated, and escape any pipe characters
+-entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',').replace('|','\|')"
+#end if
+
+##When building a BLAST database, to ensure unique IDs makeblastdb will
+##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44
+##(if using -parse_seqids) or simply assign it an ID using the record
+##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA
+##file). In -parse_seqids mode, a duplicate FASTA ID gives an error.
+##
+##The BLAST plain text and XML output will contain these BLAST IDs, but
+##the tabular output does not (at least, not in BLAST 2.2.25+).
+##Therefore in general, Galaxy users won't care about the (internal)
+##BLAST identifiers.
+##
+##The blastdbcmd FASTA output will also contain these IDs, but in the
+##context of the BLAST tabular output they are not helpful. Therefore
+##to recover the original ID as used in the FASTA file for makeblastdb
+##we need a litte post processing.
+##
+##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes
+##using sed, however the exact syntax differs for Mac OS X's sed
+
+#if str($outfmt)=="blastid":
+-out "$seq"
+#else if sys.platform == "darwin":
+| sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq"
+#else:
+| sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq"
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Extracts FASTA formatted sequences from a BLAST database
+using the NCBI BLAST+ blastdbcmd command line tool.
+
+.. class:: warningmark
+
+**BLAST assigned identifiers**
+
+When a BLAST database is constructed from a FASTA file, the
+original identifiers can be replaced with BLAST assigned
+identifiers, partly to ensure uniqueness. e.g. Sometimes
+a prefix of 'lcl|' is added (lcl is short for local),
+or an arbitrary name starting 'gnl|BL_ORD_ID|' is created.
+
+If you are using the tabular output from BLAST, it will contain
+the original identifiers - not the BLAST assigned identifiers
+suitable for use with the blastdbcmd tool.
+
+If you are using the XML or plain text output, this will also
+contain the BLAST assigned identifiers. However, this means
+getting a list of BLAST assigned identifiers isn't straightforward.
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastn_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_blastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,257 @@
+
+ Search nucleotide database with nucleotide query sequence(s)
+
+
+
+ blastn
+ blast+
+
+ blastn -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+blastn
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#else:
+ -subject "$db_opts.subject"
+#end if
+-task $blast_type
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+$adv_opts.filter_query
+$adv_opts.strand
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.identity_cutoff) and float(str($adv_opts.identity_cutoff)) > 0 ):
+-perc_identity $adv_opts.identity_cutoff
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+$adv_opts.ungapped
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *nucleotide database* using a *nucleotide query*,
+using the NCBI BLAST+ blastn command line tool.
+Algorithms include blastn, megablast, and discontiguous megablast.
+
+.. class:: warningmark
+
+You can also search against a FASTA file of subject nucleotide
+sequences. This is *not* advised because it is slower (only one
+CPU is used), but more importantly gives e-values for pairwise
+searches (very small e-values which will look overly signficiant).
+In most cases you should instead turn the other FASTA file into a
+database first using *makeblastdb* and search against that.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastp_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_blastp_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,308 @@
+
+ Search protein database with protein query sequence(s)
+
+
+
+ blastp
+ blast+
+
+ blastp -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+blastp
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#else:
+ -subject "$db_opts.subject"
+#end if
+-task $blast_type
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+$adv_opts.filter_query
+-matrix $adv_opts.matrix
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+##Ungapped disabled for now - see comments below
+##$adv_opts.ungapped
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *protein database* using a *protein query*,
+using the NCBI BLAST+ blastp command line tool.
+
+.. class:: warningmark
+
+You can also search against a FASTA file of subject protein
+sequences. This is *not* advised because it is slower (only one
+CPU is used), but more importantly gives e-values for pairwise
+searches (very small e-values which will look overly signficiant).
+In most cases you should instead turn the other FASTA file into a
+database first using *makeblastdb* and search against that.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastx_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_blastx_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,294 @@
+
+ Search protein database with translated nucleotide query sequence(s)
+
+
+
+ blastx
+ blast+
+
+ blastx -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+blastx
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#else:
+ -subject "$db_opts.subject"
+#end if
+-query_gencode $query_gencode
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+$adv_opts.filter_query
+$adv_opts.strand
+-matrix $adv_opts.matrix
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+$adv_opts.ungapped
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *protein database* using a *translated nucleotide query*,
+using the NCBI BLAST+ blastx command line tool.
+
+.. class:: warningmark
+
+You can also search against a FASTA file of subject protein
+sequences. This is *not* advised because it is slower (only one
+CPU is used), but more importantly gives e-values for pairwise
+searches (very small e-values which will look overly signficiant).
+In most cases you should instead turn the other FASTA file into a
+database first using *makeblastdb* and search against that.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_makeblastdb.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_makeblastdb.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,129 @@
+
+ Make BLAST database
+
+ makeblastdb
+ blast+
+
+ makeblastdb -version
+
+makeblastdb -out "${os.path.join($outfile.extra_files_path,'blastdb')}"
+$parse_seqids
+$hash_index
+## Single call to -in with multiple filenames space separated with outer quotes
+## (presumably any filenames with spaces would be a problem). Note this gives
+## some extra spaces, e.g. -in " file1 file2 file3 " but BLAST seems happy:
+-in "
+#for $i in $in
+${i.file} #end for
+"
+#if $title:
+-title "$title"
+#else:
+##Would default to being based on the cryptic Galaxy filenames, which is unhelpful
+-title "BLAST Database"
+#end if
+-dbtype $dbtype
+## #set $sep = '-mask_data '
+## #for $i in $mask_data
+## $sep${i.file}
+## #set $set = ', '
+## #end for
+## #set $sep = '-gi_mask -gi_mask_name '
+## #for $i in $gi_mask
+## $sep${i.file}
+## #set $set = ', '
+## #end for
+## #if $tax.select == 'id':
+## -taxid $tax.id
+## #else if $tax.select == 'map':
+## -taxid_map $tax.map
+## #end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Make BLAST database from one or more FASTA files and/or BLAST databases.
+
+This is a wrapper for the NCBI BLAST+ tool 'makeblastdb', which is the
+replacement for the 'formatdb' tool in the NCBI 'legacy' BLAST suite.
+
+
+
+**Documentation**
+
+http://www.ncbi.nlm.nih.gov/books/NBK1763/
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_rpsblast_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,238 @@
+
+ Search protein domain database (PSSMs) with protein query sequence(s)
+
+
+
+ rpsblast
+ blast+
+
+ rpsblast -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+rpsblast
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#end if
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+$adv_opts.filter_query
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *protein domain database* using a *protein query*,
+using the NCBI BLAST+ rpsblast command line tool.
+
+The protein domain databases use position-specific scoring matrices
+(PSSMs) and are available for a number of domain collections including:
+
+*CDD* - NCBI curarated meta-collection of domains, see
+http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domains
+
+*Kog* - PSSMs from automatically aligned sequences and sequence
+fragments classified in the KOGs resource, the eukaryotic
+counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/
+
+*Cog* - PSSMs from automatically aligned sequences and sequence
+fragments classified in the COGs resource, which focuses primarily
+on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/
+
+*Pfam* - PSSMs from Pfam-A seed alignment database, see
+http://pfam.sanger.ac.uk/
+
+*Smart* - PSSMs from SMART domain alignment database, see
+http://smart.embl-heidelberg.de/
+
+*Tigr* - PSSMs from TIGRFAM database of protein families, see
+http://www.jcvi.org/cms/research/projects/tigrfams/overview/
+
+*Prk* - PSSms from automatically aligned stable clusters in the
+Protein Clusters database, see
+http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters
+
+The exact list of domain databases offered will depend on how your
+local Galaxy has been configured.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,239 @@
+
+ Search protein domain database (PSSMs) with translated nucleotide query sequence(s)
+
+
+
+ rpstblastn
+ blast+
+
+ rpstblastn -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+rpstblastn
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#end if
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+##Seems rpstblastn does not currently support multiple threads :(
+##-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+$adv_opts.filter_query
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *protein domain database* using a *nucleotide query*,
+using the NCBI BLAST+ rpstblastn command line tool.
+
+The protein domain databases use position-specific scoring matrices
+(PSSMs) and are available for a number of domain collections including:
+
+*CDD* - NCBI curarated meta-collection of domains, see
+http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domains
+
+*Kog* - PSSMs from automatically aligned sequences and sequence
+fragments classified in the KOGs resource, the eukaryotic
+counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/
+
+*Cog* - PSSMs from automatically aligned sequences and sequence
+fragments classified in the COGs resource, which focuses primarily
+on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/
+
+*Pfam* - PSSMs from Pfam-A seed alignment database, see
+http://pfam.sanger.ac.uk/
+
+*Smart* - PSSMs from SMART domain alignment database, see
+http://smart.embl-heidelberg.de/
+
+*Tigr* - PSSMs from TIGRFAM database of protein families, see
+http://www.jcvi.org/cms/research/projects/tigrfams/overview/
+
+*Prk* - PSSms from automatically aligned stable clusters in the
+Protein Clusters database, see
+http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters
+
+The exact list of domain databases offered will depend on how your
+local Galaxy has been configured.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_tblastn_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_tblastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,340 @@
+
+ Search translated nucleotide database with protein query sequence(s)
+
+
+
+ tblastn
+ blast+
+
+ tblastn -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+tblastn
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#else:
+ -subject "$db_opts.subject"
+#end if
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+-db_gencode $adv_opts.db_gencode
+$adv_opts.filter_query
+-matrix $adv_opts.matrix
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+##Ungapped disabled for now - see comments below
+##$adv_opts.ungapped
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *translated nucleotide database* using a *protein query*,
+using the NCBI BLAST+ tblastn command line tool.
+
+.. class:: warningmark
+
+You can also search against a FASTA file of subject nucleotide
+sequences. This is *not* advised because it is slower (only one
+CPU is used), but more importantly gives e-values for pairwise
+searches (very small e-values which will look overly signficiant).
+In most cases you should instead turn the other FASTA file into a
+database first using *makeblastdb* and search against that.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_tblastx_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/ncbi_tblastx_wrapper.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,294 @@
+
+ Search translated nucleotide database with translated nucleotide query sequence(s)
+
+
+
+ tblastx
+ blast+
+
+ tblastx -version
+
+## The command is a Cheetah template which allows some Python based syntax.
+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
+tblastx
+-query "$query"
+#if $db_opts.db_opts_selector == "db":
+ -db "${db_opts.database.fields.path}"
+#elif $db_opts.db_opts_selector == "histdb":
+ -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
+#else:
+ -subject "$db_opts.subject"
+#end if
+-query_gencode $query_gencode
+-evalue $evalue_cutoff
+-out "$output1"
+##Set the extended list here so if/when we add things, saved workflows are not affected
+#if str($out_format)=="ext":
+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
+#else:
+ -outfmt $out_format
+#end if
+-num_threads 8
+#if $adv_opts.adv_opts_selector=="advanced":
+-db_gencode $adv_opts.db_gencode
+$adv_opts.filter_query
+$adv_opts.strand
+-matrix $adv_opts.matrix
+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
+## Note -max_target_seqs overrides -num_descriptions and -num_alignments
+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
+-max_target_seqs $adv_opts.max_hits
+#end if
+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
+-word_size $adv_opts.word_size
+#end if
+$adv_opts.parse_deflines
+## End of advanced options:
+#end if
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Database searches may take a substantial amount of time.
+For large input datasets it is advisable to allow overnight processing.
+
+-----
+
+**What it does**
+
+Search a *translated nucleotide database* using a *protein query*,
+using the NCBI BLAST+ tblastx command line tool.
+
+.. class:: warningmark
+
+You can also search against a FASTA file of subject nucleotide
+sequences. This is *not* advised because it is slower (only one
+CPU is used), but more importantly gives e-values for pairwise
+searches (very small e-values which will look overly signficiant).
+In most cases you should instead turn the other FASTA file into a
+database first using *makeblastdb* and search against that.
+
+-----
+
+**Output format**
+
+Because Galaxy focuses on processing tabular data, the default output of this
+tool is tabular. The standard BLAST+ tabular output contains 12 columns:
+
+====== ========= ============================================
+Column NCBI name Description
+------ --------- --------------------------------------------
+ 1 qseqid Query Seq-id (ID of your sequence)
+ 2 sseqid Subject Seq-id (ID of the database hit)
+ 3 pident Percentage of identical matches
+ 4 length Alignment length
+ 5 mismatch Number of mismatches
+ 6 gapopen Number of gap openings
+ 7 qstart Start of alignment in query
+ 8 qend End of alignment in query
+ 9 sstart Start of alignment in subject (database hit)
+ 10 send End of alignment in subject (database hit)
+ 11 evalue Expectation value (E-value)
+ 12 bitscore Bit score
+====== ========= ============================================
+
+The BLAST+ tools can optionally output additional columns of information,
+but this takes longer to calculate. Most (but not all) of these columns are
+included by selecting the extended tabular output. The extra columns are
+included *after* the standard 12 columns. This is so that you can write
+workflow filtering steps that accept either the 12 or 24 column tabular
+BLAST output. Galaxy now uses this extended 24 column output by default.
+
+====== ============= ===========================================
+Column NCBI name Description
+------ ------------- -------------------------------------------
+ 13 sallseqid All subject Seq-id(s), separated by a ';'
+ 14 score Raw score
+ 15 nident Number of identical matches
+ 16 positive Number of positive-scoring matches
+ 17 gaps Total number of gaps
+ 18 ppos Percentage of positive-scoring matches
+ 19 qframe Query frame
+ 20 sframe Subject frame
+ 21 qseq Aligned part of query sequence
+ 22 sseq Aligned part of subject sequence
+ 23 qlen Query sequence length
+ 24 slen Subject sequence length
+====== ============= ===========================================
+
+The third option is BLAST XML output, which is designed to be parsed by
+another program, and is understood by some Galaxy tools.
+
+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
+
+-------
+
+**References**
+
+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/repository_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/repository_dependencies.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,5 @@
+
+
+
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ncbi_blast_plus/tool_dependencies.xml Tue Jul 30 07:33:46 2013 -0400
@@ -0,0 +1,20 @@
+
+
+
+
+
+ ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz
+ cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install
+
+ $INSTALL_DIR/bin
+
+
+
+
+Downloads and compiles BLAST+ from the NCBI, which assumes you have
+all the required build dependencies installed. See:
+http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
+
+
+
+
diff -r c1a6e5aefee0 -r 688f3fb09a6a test-data/blastx_sample.xml
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/blastxml_to_tabular.py
--- a/tools/ncbi_blast_plus/blastxml_to_tabular.py Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,261 +0,0 @@
-#!/usr/bin/env python
-"""Convert a BLAST XML file to tabular output.
-
-Takes three command line options, input BLAST XML filename, output tabular
-BLAST filename, output format (std for standard 12 columns, or ext for the
-extended 24 columns offered in the BLAST+ wrappers).
-
-The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
-qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
-mean:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The additional columns offered in the Galaxy BLAST+ wrappers are:
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-Most of these fields are given explicitly in the XML file, others some like
-the percentage identity and the number of gap openings must be calculated.
-
-Be aware that the sequence in the extended tabular output or XML direct from
-BLAST+ may or may not use XXXX masking on regions of low complexity. This
-can throw the off the calculation of percentage identity and gap openings.
-[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
-with these numbers changing depending on whether or not the low complexity
-filter is used.]
-
-This script attempts to produce identical output to what BLAST+ would have done.
-However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
-space character (probably a bug).
-"""
-import sys
-import re
-
-if "-v" in sys.argv or "--version" in sys.argv:
- print "v0.0.12"
- sys.exit(0)
-
-if sys.version_info[:2] >= ( 2, 5 ):
- try:
- from xml.etree import cElementTree as ElementTree
- except ImportError:
- from xml.etree import ElementTree as ElementTree
-else:
- from galaxy import eggs
- import pkg_resources; pkg_resources.require( "elementtree" )
- from elementtree import ElementTree
-
-def stop_err( msg ):
- sys.stderr.write("%s\n" % msg)
- sys.exit(1)
-
-#Parse Command Line
-try:
- in_file, out_file, out_fmt = sys.argv[1:]
-except:
- stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")
-
-if out_fmt == "std":
- extended = False
-elif out_fmt == "x22":
- stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")
-elif out_fmt == "ext":
- extended = True
-else:
- stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")
-
-
-# get an iterable
-try:
- context = ElementTree.iterparse(in_file, events=("start", "end"))
-except:
- stop_err("Invalid data format.")
-# turn it into an iterator
-context = iter(context)
-# get the root element
-try:
- event, root = context.next()
-except:
- stop_err( "Invalid data format." )
-
-
-re_default_query_id = re.compile("^Query_\d+$")
-assert re_default_query_id.match("Query_101")
-assert not re_default_query_id.match("Query_101a")
-assert not re_default_query_id.match("MyQuery_101")
-re_default_subject_id = re.compile("^Subject_\d+$")
-assert re_default_subject_id.match("Subject_1")
-assert not re_default_subject_id.match("Subject_")
-assert not re_default_subject_id.match("Subject_12a")
-assert not re_default_subject_id.match("TheSubject_1")
-
-
-outfile = open(out_file, 'w')
-blast_program = None
-for event, elem in context:
- if event == "end" and elem.tag == "BlastOutput_program":
- blast_program = elem.text
- # for every tag
- if event == "end" and elem.tag == "Iteration":
- #Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
- # sp|Q9BS26|ERP44_HUMAN
- # Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- # 406
- #
- #
- #Or, from BLAST 2.2.24+ run online
- # Query_1
- # Sample
- # 516
- # ...
- qseqid = elem.findtext("Iteration_query-ID")
- if re_default_query_id.match(qseqid):
- #Place holder ID, take the first word of the query definition
- qseqid = elem.findtext("Iteration_query-def").split(None,1)[0]
- qlen = int(elem.findtext("Iteration_query-len"))
-
- # for every within
- for hit in elem.findall("Iteration_hits/Hit"):
- #Expecting either this,
- # gi|3024260|sp|P56514.1|OPSD_BUFBU
- # RecName: Full=Rhodopsin
- # P56514
- #or,
- # Subject_1
- # gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]
- # Subject_1
- #
- #apparently depending on the parse_deflines switch
- sseqid = hit.findtext("Hit_id").split(None,1)[0]
- hit_def = sseqid + " " + hit.findtext("Hit_def")
- if re_default_subject_id.match(sseqid) \
- and sseqid == hit.findtext("Hit_accession"):
- #Place holder ID, take the first word of the subject definition
- hit_def = hit.findtext("Hit_def")
- sseqid = hit_def.split(None,1)[0]
- # for every within
- for hsp in hit.findall("Hit_hsps/Hsp"):
- nident = hsp.findtext("Hsp_identity")
- length = hsp.findtext("Hsp_align-len")
- pident = "%0.2f" % (100*float(nident)/float(length))
-
- q_seq = hsp.findtext("Hsp_qseq")
- h_seq = hsp.findtext("Hsp_hseq")
- m_seq = hsp.findtext("Hsp_midline")
- assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
- gapopen = str(len(q_seq.replace('-', ' ').split())-1 + \
- len(h_seq.replace('-', ' ').split())-1)
-
- mismatch = m_seq.count(' ') + m_seq.count('+') \
- - q_seq.count('-') - h_seq.count('-')
- #TODO - Remove this alternative mismatch calculation and test
- #once satisifed there are no problems
- expected_mismatch = len(q_seq) \
- - sum(1 for q,h in zip(q_seq, h_seq) \
- if q == h or q == "-" or h == "-")
- xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")
- if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):
- stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \
- % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),
- int(mismatch), expected_mismatch))
-
- #TODO - Remove this alternative identity calculation and test
- #once satisifed there are no problems
- expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)
- if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):
- stop_err("%s vs %s identities, expected %i <= %i <= %i" \
- % (qseqid, sseqid, expected_identity, int(nident),
- expected_identity + q_seq.count("X")))
-
-
- evalue = hsp.findtext("Hsp_evalue")
- if evalue == "0":
- evalue = "0.0"
- else:
- evalue = "%0.0e" % float(evalue)
-
- bitscore = float(hsp.findtext("Hsp_bit-score"))
- if bitscore < 100:
- #Seems to show one decimal place for lower scores
- bitscore = "%0.1f" % bitscore
- else:
- #Note BLAST does not round to nearest int, it truncates
- bitscore = "%i" % bitscore
-
- values = [qseqid,
- sseqid,
- pident,
- length, #hsp.findtext("Hsp_align-len")
- str(mismatch),
- gapopen,
- hsp.findtext("Hsp_query-from"), #qstart,
- hsp.findtext("Hsp_query-to"), #qend,
- hsp.findtext("Hsp_hit-from"), #sstart,
- hsp.findtext("Hsp_hit-to"), #send,
- evalue, #hsp.findtext("Hsp_evalue") in scientific notation
- bitscore, #hsp.findtext("Hsp_bit-score") rounded
- ]
-
- if extended:
- sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))
- #print hit_def, "-->", sallseqid
- positive = hsp.findtext("Hsp_positive")
- ppos = "%0.2f" % (100*float(positive)/float(length))
- qframe = hsp.findtext("Hsp_query-frame")
- sframe = hsp.findtext("Hsp_hit-frame")
- if blast_program == "blastp":
- #Probably a bug in BLASTP that they use 0 or 1 depending on format
- if qframe == "0": qframe = "1"
- if sframe == "0": sframe = "1"
- slen = int(hit.findtext("Hit_len"))
- values.extend([sallseqid,
- hsp.findtext("Hsp_score"), #score,
- nident,
- positive,
- hsp.findtext("Hsp_gaps"), #gaps,
- ppos,
- qframe,
- sframe,
- #NOTE - for blastp, XML shows original seq, tabular uses XXX masking
- q_seq,
- h_seq,
- str(qlen),
- str(slen),
- ])
- #print "\t".join(values)
- outfile.write("\t".join(values) + "\n")
- # prevents ElementTree from growing large datastructure
- root.clear()
- elem.clear()
-outfile.close()
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/blastxml_to_tabular.xml
--- a/tools/ncbi_blast_plus/blastxml_to_tabular.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,137 +0,0 @@
-
- Convert BLAST XML output to tabular
- blastxml_to_tabular.py --version
-
- blastxml_to_tabular.py $blastxml_file $tabular_file $out_format
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of
-formats including tabular and a more detailed XML format. A complex workflow
-may need both the XML and the tabular output - but running BLAST twice is
-slow and wasteful.
-
-This tool takes the BLAST XML output and can convert it into the
-standard 12 column tabular equivalent:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 22 column tabular
-BLAST output. This tool now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-Beware that the XML file (and thus the conversion) and the tabular output
-direct from BLAST+ may differ in the presence of XXXX masking on regions
-low complexity (columns 21 and 22), and thus also calculated figures like
-the percentage identity (column 3).
-
-**References**
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blast_plus.txt
--- a/tools/ncbi_blast_plus/ncbi_blast_plus.txt Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,151 +0,0 @@
-Galaxy wrappers for NCBI BLAST+ suite
-=====================================
-
-These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below.
-
-Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+),
-and does not work with the NCBI 'legacy' BLAST suite (e.g. blastall).
-
-Note that these wrappers (and the associated datatypes) were originally
-distributed as part of the main Galaxy repository, but as of August 2012
-moved to the Galaxy Tool Shed as 'ncbi_blast_plus' (and 'blast_datatypes').
-My thanks to Dannon Baker from the Galaxy development team for his assistance
-with this.
-
-These wrappers are available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
-Automated Installation
-======================
-
-Galaxy should be able to automatically install the dependencies, i.e. the
-'blast_datatypes' repository which defines the BLAST XML file format
-('blastxml') and protein and nucleotide BLAST databases ('blastdbp' and
-'blastdbn').
-
-You must tell Galaxy about any system level BLAST databases using configuration
-files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein
-databases like NR), and blastdb_d.loc (protein domain databases like CDD or
-SMART) which are located in the tool-data/ folder. Sample files are included
-which explain the tab-based format to use.
-
-You can download the NCBI provided databases as tar-balls from here:
-ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR)
-ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD)
-
-
-Manual Installation
-===================
-
-For those not using Galaxy's automated installation from the Tool Shed, put
-the XML and Python files in the tools/ncbi_blast_plus/ folder and add the XML
-files to your tool_conf.xml as normal (and do the same in tool_conf.xml.sample
-in order to run the unit tests). For example, use:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-You will also need to install 'blast_datatypes' from the Tool Shed. This
-defines the BLAST XML file format ('blastxml') and protein and nucleotide
-BLAST databases composite file formats ('blastdbp' and 'blastdbn').
-
-As described above for an automated installation, you must also tell Galaxy
-about any system level BLAST databases using the tool-data/blastdb*.loc files.
-
-You must install the NCBI BLAST+ standalone tools somewhere on the system
-path. Currently the unit tests are written using "BLAST 2.2.26+".
-
-Run the functional tests (adjusting the section identifier to match your
-tool_conf.xml.sample file):
-
-./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools
-
-
-History
-=======
-
-v0.0.11 - Final revision as part of the Galaxy main repository, and the
- first release via the Tool Shed
-v0.0.12 - Implements genetic code option for translation searches.
- - Changes to 1000 sequences at a time (to cope with
- very large sets of queries where BLAST+ can become memory hungry)
- - Include warning that BLAST+ with subject FASTA gives pairwise
- e-values
-v0.0.13 - Use the new error handling options in Galaxy (the previously
- bundled hide_stderr.py script is no longer needed).
-v0.0.14 - Support for makeblastdb and blastdbinfo with local BLAST databases
- in the history (using work from Edward Kirton), requires v0.0.14
- of the 'blast_datatypes' repository from the Tool Shed.
-v0.0.15 - Stronger warning in help text against searching against subject
- FASTA files (better looking e-values than you might be expecting).
-v0.0.16 - Added repository_dependencies.xml for automates installation of the
- 'blast_datatypes' repository from the Tool Shed.
-v0.0.17 - The BLAST+ search tools now default to extended tabular output
- (all too often our users where having to re-run searches just to
- get one of the missing columns like query or subject length)
-v0.0.18 - Defensive quoting of filenames in case of spaces (where possible,
- BLAST+ handling of some mult-file arguments is problematic).
-v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc
- for the domain databases they use (e.g. CDD, PFAM or SMART).
- - Correct case of exception regular expression (for error handling
- fall-back in case the return code is not set properly).
- - Clearer naming of output files.
-v0.0.20 - Added unit tests for BLASTN and TBLASTX.
- - Fallback on ElementTree if cElementTree missing in XML to tabular.
- - Link to Tool Shed added to help text and this documentation.
- - Tweak dependency on blast_datatypes to also work on Test Tool Shed
-
-
-Developers
-==========
-
-This script and related tools are being developed on the 'tools' branch of the
-following Mercurial repository:
-https://bitbucket.org/peterjc/galaxy-central/
-
-For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use
-the following command from the Galaxy root folder:
-
-$ ./tools/ncbi_blast_plus/make_ncbi_blast_plus.sh
-
-This simplifies ensuring a consistent set of files is bundled each time,
-including all the relevant test files.
-
-
-Licence (MIT/BSD style)
-=======================
-
-Permission to use, copy, modify, and distribute this software and its
-documentation with or without modifications and for any purpose and
-without fee is hereby granted, provided that any copyright notices
-appear in all copies and that both those copyright notices and this
-permission notice appear in supporting documentation, and that the
-names of the contributors or copyright holders not be used in
-advertising or publicity pertaining to distribution of the software
-without specific prior permission.
-
-THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
-WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
-CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
-OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
-OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
-OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
-OR PERFORMANCE OF THIS SOFTWARE.
-
-NOTE: This is the licence for the Galaxy Wrapper only. NCBI BLAST+ and
-associated data files are available and licenced separately.
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml
--- a/tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,67 +0,0 @@
-
- Show BLAST database information from blastdbcmd
-
- blastdbcmd
- blast+
-
- blastdbcmd -version
-
-blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" -info -out "$info"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Calls the NCBI BLAST+ blastdbcmd command line tool with the -info
-switch to give summary information about a BLAST database, such as
-the size (number of sequences and total length) and date.
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,139 +0,0 @@
-
- Extract sequence(s) from BLAST database
-
- blastdbcmd
- blast+
-
- blastdbcmd -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}"
-
-##TODO: What about -ctrl_a and -target_only as advanced options?
-
-#if $id_opts.id_type=="file":
--entry_batch "$id_opts.entries"
-#else:
-##Perform some simple search/replaces to remove whitespace
-##and make it comma separated, and escape any pipe characters
--entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',').replace('|','\|')"
-#end if
-
-##When building a BLAST database, to ensure unique IDs makeblastdb will
-##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44
-##(if using -parse_seqids) or simply assign it an ID using the record
-##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA
-##file). In -parse_seqids mode, a duplicate FASTA ID gives an error.
-##
-##The BLAST plain text and XML output will contain these BLAST IDs, but
-##the tabular output does not (at least, not in BLAST 2.2.25+).
-##Therefore in general, Galaxy users won't care about the (internal)
-##BLAST identifiers.
-##
-##The blastdbcmd FASTA output will also contain these IDs, but in the
-##context of the BLAST tabular output they are not helpful. Therefore
-##to recover the original ID as used in the FASTA file for makeblastdb
-##we need a litte post processing.
-##
-##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes
-##using sed, however the exact syntax differs for Mac OS X's sed
-
-#if str($outfmt)=="blastid":
--out "$seq"
-#else if sys.platform == "darwin":
-| sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq"
-#else:
-| sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq"
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Extracts FASTA formatted sequences from a BLAST database
-using the NCBI BLAST+ blastdbcmd command line tool.
-
-.. class:: warningmark
-
-**BLAST assigned identifiers**
-
-When a BLAST database is constructed from a FASTA file, the
-original identifiers can be replaced with BLAST assigned
-identifiers, partly to ensure uniqueness. e.g. Sometimes
-a prefix of 'lcl|' is added (lcl is short for local),
-or an arbitrary name starting 'gnl|BL_ORD_ID|' is created.
-
-If you are using the tabular output from BLAST, it will contain
-the original identifiers - not the BLAST assigned identifiers
-suitable for use with the blastdbcmd tool.
-
-If you are using the XML or plain text output, this will also
-contain the BLAST assigned identifiers. However, this means
-getting a list of BLAST assigned identifiers isn't straightforward.
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,253 +0,0 @@
-
- Search nucleotide database with nucleotide query sequence(s)
-
-
-
- blastn
- blast+
-
- blastn -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-blastn
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#else:
- -subject "$db_opts.subject"
-#end if
--task $blast_type
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
-$adv_opts.filter_query
-$adv_opts.strand
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-$adv_opts.ungapped
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *nucleotide database* using a *nucleotide query*,
-using the NCBI BLAST+ blastn command line tool.
-Algorithms include blastn, megablast, and discontiguous megablast.
-
-.. class:: warningmark
-
-You can also search against a FASTA file of subject nucleotide
-sequences. This is *not* advised because it is slower (only one
-CPU is used), but more importantly gives e-values for pairwise
-searches (very small e-values which will look overly signficiant).
-In most cases you should instead turn the other FASTA file into a
-database first using *makeblastdb* and search against that.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,308 +0,0 @@
-
- Search protein database with protein query sequence(s)
-
-
-
- blastp
- blast+
-
- blastp -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-blastp
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#else:
- -subject "$db_opts.subject"
-#end if
--task $blast_type
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
-$adv_opts.filter_query
--matrix $adv_opts.matrix
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-##Ungapped disabled for now - see comments below
-##$adv_opts.ungapped
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *protein database* using a *protein query*,
-using the NCBI BLAST+ blastp command line tool.
-
-.. class:: warningmark
-
-You can also search against a FASTA file of subject protein
-sequences. This is *not* advised because it is slower (only one
-CPU is used), but more importantly gives e-values for pairwise
-searches (very small e-values which will look overly signficiant).
-In most cases you should instead turn the other FASTA file into a
-database first using *makeblastdb* and search against that.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,294 +0,0 @@
-
- Search protein database with translated nucleotide query sequence(s)
-
-
-
- blastx
- blast+
-
- blastx -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-blastx
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#else:
- -subject "$db_opts.subject"
-#end if
--query_gencode $query_gencode
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
-$adv_opts.filter_query
-$adv_opts.strand
--matrix $adv_opts.matrix
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-$adv_opts.ungapped
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *protein database* using a *translated nucleotide query*,
-using the NCBI BLAST+ blastx command line tool.
-
-.. class:: warningmark
-
-You can also search against a FASTA file of subject protein
-sequences. This is *not* advised because it is slower (only one
-CPU is used), but more importantly gives e-values for pairwise
-searches (very small e-values which will look overly signficiant).
-In most cases you should instead turn the other FASTA file into a
-database first using *makeblastdb* and search against that.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_makeblastdb.xml
--- a/tools/ncbi_blast_plus/ncbi_makeblastdb.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,129 +0,0 @@
-
- Make BLAST database
-
- makeblastdb
- blast+
-
- makeblastdb -version
-
-makeblastdb -out "${os.path.join($outfile.extra_files_path,'blastdb')}"
-$parse_seqids
-$hash_index
-## Single call to -in with multiple filenames space separated with outer quotes
-## (presumably any filenames with spaces would be a problem). Note this gives
-## some extra spaces, e.g. -in " file1 file2 file3 " but BLAST seems happy:
--in "
-#for $i in $in
-${i.file} #end for
-"
-#if $title:
--title "$title"
-#else:
-##Would default to being based on the cryptic Galaxy filenames, which is unhelpful
--title "BLAST Database"
-#end if
--dbtype $dbtype
-## #set $sep = '-mask_data '
-## #for $i in $mask_data
-## $sep${i.file}
-## #set $set = ', '
-## #end for
-## #set $sep = '-gi_mask -gi_mask_name '
-## #for $i in $gi_mask
-## $sep${i.file}
-## #set $set = ', '
-## #end for
-## #if $tax.select == 'id':
-## -taxid $tax.id
-## #else if $tax.select == 'map':
-## -taxid_map $tax.map
-## #end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Make BLAST database from one or more FASTA files and/or BLAST databases.
-
-This is a wrapper for the NCBI BLAST+ tool 'makeblastdb', which is the
-replacement for the 'formatdb' tool in the NCBI 'legacy' BLAST suite.
-
-
-
-**Documentation**
-
-http://www.ncbi.nlm.nih.gov/books/NBK1763/
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,238 +0,0 @@
-
- Search protein domain database (PSSMs) with protein query sequence(s)
-
-
-
- rpsblast
- blast+
-
- rpsblast -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-rpsblast
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#end if
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
-$adv_opts.filter_query
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *protein domain database* using a *protein query*,
-using the NCBI BLAST+ rpsblast command line tool.
-
-The protein domain databases use position-specific scoring matrices
-(PSSMs) and are available for a number of domain collections including:
-
-*CDD* - NCBI curarated meta-collection of domains, see
-http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domains
-
-*Kog* - PSSMs from automatically aligned sequences and sequence
-fragments classified in the KOGs resource, the eukaryotic
-counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/
-
-*Cog* - PSSMs from automatically aligned sequences and sequence
-fragments classified in the COGs resource, which focuses primarily
-on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/
-
-*Pfam* - PSSMs from Pfam-A seed alignment database, see
-http://pfam.sanger.ac.uk/
-
-*Smart* - PSSMs from SMART domain alignment database, see
-http://smart.embl-heidelberg.de/
-
-*Tigr* - PSSMs from TIGRFAM database of protein families, see
-http://www.jcvi.org/cms/research/projects/tigrfams/overview/
-
-*Prk* - PSSms from automatically aligned stable clusters in the
-Protein Clusters database, see
-http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters
-
-The exact list of domain databases offered will depend on how your
-local Galaxy has been configured.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,239 +0,0 @@
-
- Search protein domain database (PSSMs) with translated nucleotide query sequence(s)
-
-
-
- rpstblastn
- blast+
-
- rpstblastn -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-rpstblastn
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#end if
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
-##Seems rpstblastn does not currently support multiple threads :(
-##-num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
-$adv_opts.filter_query
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *protein domain database* using a *nucleotide query*,
-using the NCBI BLAST+ rpstblastn command line tool.
-
-The protein domain databases use position-specific scoring matrices
-(PSSMs) and are available for a number of domain collections including:
-
-*CDD* - NCBI curarated meta-collection of domains, see
-http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domains
-
-*Kog* - PSSMs from automatically aligned sequences and sequence
-fragments classified in the KOGs resource, the eukaryotic
-counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/
-
-*Cog* - PSSMs from automatically aligned sequences and sequence
-fragments classified in the COGs resource, which focuses primarily
-on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/
-
-*Pfam* - PSSMs from Pfam-A seed alignment database, see
-http://pfam.sanger.ac.uk/
-
-*Smart* - PSSMs from SMART domain alignment database, see
-http://smart.embl-heidelberg.de/
-
-*Tigr* - PSSMs from TIGRFAM database of protein families, see
-http://www.jcvi.org/cms/research/projects/tigrfams/overview/
-
-*Prk* - PSSms from automatically aligned stable clusters in the
-Protein Clusters database, see
-http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters
-
-The exact list of domain databases offered will depend on how your
-local Galaxy has been configured.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,340 +0,0 @@
-
- Search translated nucleotide database with protein query sequence(s)
-
-
-
- tblastn
- blast+
-
- tblastn -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-tblastn
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#else:
- -subject "$db_opts.subject"
-#end if
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
--db_gencode $adv_opts.db_gencode
-$adv_opts.filter_query
--matrix $adv_opts.matrix
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-##Ungapped disabled for now - see comments below
-##$adv_opts.ungapped
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *translated nucleotide database* using a *protein query*,
-using the NCBI BLAST+ tblastn command line tool.
-
-.. class:: warningmark
-
-You can also search against a FASTA file of subject nucleotide
-sequences. This is *not* advised because it is slower (only one
-CPU is used), but more importantly gives e-values for pairwise
-searches (very small e-values which will look overly signficiant).
-In most cases you should instead turn the other FASTA file into a
-database first using *makeblastdb* and search against that.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml
--- a/tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,294 +0,0 @@
-
- Search translated nucleotide database with translated nucleotide query sequence(s)
-
-
-
- tblastx
- blast+
-
- tblastx -version
-
-## The command is a Cheetah template which allows some Python based syntax.
-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
-tblastx
--query "$query"
-#if $db_opts.db_opts_selector == "db":
- -db "${db_opts.database.fields.path}"
-#elif $db_opts.db_opts_selector == "histdb":
- -db "${os.path.join($db_opts.histdb.extra_files_path,'blastdb')}"
-#else:
- -subject "$db_opts.subject"
-#end if
--query_gencode $query_gencode
--evalue $evalue_cutoff
--out "$output1"
-##Set the extended list here so if/when we add things, saved workflows are not affected
-#if str($out_format)=="ext":
- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"
-#else:
- -outfmt $out_format
-#end if
--num_threads 8
-#if $adv_opts.adv_opts_selector=="advanced":
--db_gencode $adv_opts.db_gencode
-$adv_opts.filter_query
-$adv_opts.strand
--matrix $adv_opts.matrix
-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
-## Note -max_target_seqs overrides -num_descriptions and -num_alignments
-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
--max_target_seqs $adv_opts.max_hits
-#end if
-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):
--word_size $adv_opts.word_size
-#end if
-$adv_opts.parse_deflines
-## End of advanced options:
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Database searches may take a substantial amount of time.
-For large input datasets it is advisable to allow overnight processing.
-
------
-
-**What it does**
-
-Search a *translated nucleotide database* using a *protein query*,
-using the NCBI BLAST+ tblastx command line tool.
-
-.. class:: warningmark
-
-You can also search against a FASTA file of subject nucleotide
-sequences. This is *not* advised because it is slower (only one
-CPU is used), but more importantly gives e-values for pairwise
-searches (very small e-values which will look overly signficiant).
-In most cases you should instead turn the other FASTA file into a
-database first using *makeblastdb* and search against that.
-
------
-
-**Output format**
-
-Because Galaxy focuses on processing tabular data, the default output of this
-tool is tabular. The standard BLAST+ tabular output contains 12 columns:
-
-====== ========= ============================================
-Column NCBI name Description
------- --------- --------------------------------------------
- 1 qseqid Query Seq-id (ID of your sequence)
- 2 sseqid Subject Seq-id (ID of the database hit)
- 3 pident Percentage of identical matches
- 4 length Alignment length
- 5 mismatch Number of mismatches
- 6 gapopen Number of gap openings
- 7 qstart Start of alignment in query
- 8 qend End of alignment in query
- 9 sstart Start of alignment in subject (database hit)
- 10 send End of alignment in subject (database hit)
- 11 evalue Expectation value (E-value)
- 12 bitscore Bit score
-====== ========= ============================================
-
-The BLAST+ tools can optionally output additional columns of information,
-but this takes longer to calculate. Most (but not all) of these columns are
-included by selecting the extended tabular output. The extra columns are
-included *after* the standard 12 columns. This is so that you can write
-workflow filtering steps that accept either the 12 or 24 column tabular
-BLAST output. Galaxy now uses this extended 24 column output by default.
-
-====== ============= ===========================================
-Column NCBI name Description
------- ------------- -------------------------------------------
- 13 sallseqid All subject Seq-id(s), separated by a ';'
- 14 score Raw score
- 15 nident Number of identical matches
- 16 positive Number of positive-scoring matches
- 17 gaps Total number of gaps
- 18 ppos Percentage of positive-scoring matches
- 19 qframe Query frame
- 20 sframe Subject frame
- 21 qseq Aligned part of query sequence
- 22 sseq Aligned part of subject sequence
- 23 qlen Query sequence length
- 24 slen Subject sequence length
-====== ============= ===========================================
-
-The third option is BLAST XML output, which is designed to be parsed by
-another program, and is understood by some Galaxy tools.
-
-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).
-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.
-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.
-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,
-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).
-
--------
-
-**References**
-
-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/repository_dependencies.xml
--- a/tools/ncbi_blast_plus/repository_dependencies.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,5 +0,0 @@
-
-
-
-
-
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/tool_dependencies.xml
--- a/tools/ncbi_blast_plus/tool_dependencies.xml Wed May 29 10:03:48 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,20 +0,0 @@
-
-
-
-
-
- ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz
- cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install
-
- $INSTALL_DIR/bin
-
-
-
-
-Downloads and compiles BLAST+ from the NCBI, which assumes you have
-all the required build dependencies installed. See:
-http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
-
-
-
-