Previous changeset 19:c1a6e5aefee0 (2013-05-29) Next changeset 21:6902315b7730 (2013-08-05) |
Commit message:
Uploaded v0.0.20 preview 11, moved to GitHub, MIT license, reST markup. |
modified:
test-data/blastx_sample.xml |
added:
ncbi_blast_plus/README.rst ncbi_blast_plus/blastxml_to_tabular.py ncbi_blast_plus/blastxml_to_tabular.xml ncbi_blast_plus/ncbi_blastdbcmd_info.xml ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml ncbi_blast_plus/ncbi_blastn_wrapper.xml ncbi_blast_plus/ncbi_blastp_wrapper.xml ncbi_blast_plus/ncbi_blastx_wrapper.xml ncbi_blast_plus/ncbi_makeblastdb.xml ncbi_blast_plus/ncbi_rpsblast_wrapper.xml ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml ncbi_blast_plus/ncbi_tblastn_wrapper.xml ncbi_blast_plus/ncbi_tblastx_wrapper.xml ncbi_blast_plus/repository_dependencies.xml ncbi_blast_plus/tool_dependencies.xml |
removed:
tools/ncbi_blast_plus/blastxml_to_tabular.py tools/ncbi_blast_plus/blastxml_to_tabular.xml tools/ncbi_blast_plus/ncbi_blast_plus.txt tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml tools/ncbi_blast_plus/ncbi_makeblastdb.xml tools/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml tools/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml tools/ncbi_blast_plus/repository_dependencies.xml tools/ncbi_blast_plus/tool_dependencies.xml |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/README.rst Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,166 @@ +Galaxy wrappers for NCBI BLAST+ suite +===================================== + +These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+), +and does not work with the NCBI 'legacy' BLAST suite (e.g. blastall). + +Note that these wrappers (and the associated datatypes) were originally +distributed as part of the main Galaxy repository, but as of August 2012 +moved to the Galaxy Tool Shed as 'ncbi_blast_plus' (and 'blast_datatypes'). +My thanks to Dannon Baker from the Galaxy development team for his assistance +with this. + +These wrappers are available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus + + +Automated Installation +====================== + +Galaxy should be able to automatically install the dependencies, i.e. the +'blast_datatypes' repository which defines the BLAST XML file format +('blastxml') and protein and nucleotide BLAST databases ('blastdbp' and +'blastdbn'). + +You must tell Galaxy about any system level BLAST databases using configuration +files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein +databases like NR), and blastdb_d.loc (protein domain databases like CDD or +SMART) which are located in the tool-data/ folder. Sample files are included +which explain the tab-based format to use. + +You can download the NCBI provided databases as tar-balls from here: + +* ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR) +* ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD) + + +Manual Installation +=================== + +For those not using Galaxy's automated installation from the Tool Shed, put +the XML and Python files in the tools/ncbi_blast_plus/ folder and add the XML +files to your tool_conf.xml as normal (and do the same in tool_conf.xml.sample +in order to run the unit tests). For example, use:: + + <section name="NCBI BLAST+" id="ncbi_blast_plus_tools"> + <tool file="ncbi_blast_plus/ncbi_blastn_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_blastp_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_blastx_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_tblastn_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_tblastx_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_makeblastdb.xml" /> + <tool file="ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_blastdbcmd_info.xml" /> + <tool file="ncbi_blast_plus/ncbi_rpsblast_wrapper.xml" /> + <tool file="ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml" /> + <tool file="ncbi_blast_plus/blastxml_to_tabular.xml" /> + </section> + +You will also need to install 'blast_datatypes' from the Tool Shed. This +defines the BLAST XML file format ('blastxml') and protein and nucleotide +BLAST databases composite file formats ('blastdbp' and 'blastdbn'). + +As described above for an automated installation, you must also tell Galaxy +about any system level BLAST databases using the tool-data/blastdb*.loc files. + +You must install the NCBI BLAST+ standalone tools somewhere on the system +path. Currently the unit tests are written using "BLAST 2.2.26+". + +Run the functional tests (adjusting the section identifier to match your +tool_conf.xml.sample file):: + + ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.11 - Final revision as part of the Galaxy main repository, and the + first release via the Tool Shed +v0.0.12 - Implements genetic code option for translation searches. + - Changes <parallelism> to 1000 sequences at a time (to cope with + very large sets of queries where BLAST+ can become memory hungry) + - Include warning that BLAST+ with subject FASTA gives pairwise + e-values +v0.0.13 - Use the new error handling options in Galaxy (the previously + bundled hide_stderr.py script is no longer needed). +v0.0.14 - Support for makeblastdb and blastdbinfo with local BLAST databases + in the history (using work from Edward Kirton), requires v0.0.14 + of the 'blast_datatypes' repository from the Tool Shed. +v0.0.15 - Stronger warning in help text against searching against subject + FASTA files (better looking e-values than you might be expecting). +v0.0.16 - Added repository_dependencies.xml for automates installation of the + 'blast_datatypes' repository from the Tool Shed. +v0.0.17 - The BLAST+ search tools now default to extended tabular output + (all too often our users where having to re-run searches just to + get one of the missing columns like query or subject length) +v0.0.18 - Defensive quoting of filenames in case of spaces (where possible, + BLAST+ handling of some mult-file arguments is problematic). +v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc + for the domain databases they use (e.g. CDD, PFAM or SMART). + - Correct case of exception regular expression (for error handling + fall-back in case the return code is not set properly). + - Clearer naming of output files. +v0.0.20 - Added unit tests for BLASTN and TBLASTX. + - Added percentage identity option to BLASTN. + - Fallback on ElementTree if cElementTree missing in XML to tabular. + - Link to Tool Shed added to help text and this documentation. + - Tweak dependency on blast_datatypes to also work on Test Tool Shed + - Adopted standard MIT License. + - Development moved to GitHub, https://github.com/peterjc/galaxy_blast +======= ====================================================================== + + +Bug Reports +=========== + +You can file an issue here https://github.com/peterjc/galaxy_blast/issues or ask +us on the Galaxy development list http://lists.bx.psu.edu/listinfo/galaxy-dev + + +Developers +========== + +This script and related tools were originally developed on the 'tools' branch +of the following Mercurial repository: +https://bitbucket.org/peterjc/galaxy-central/ + +As of July 2013, development is continuing on a dedicated GitHub repository: +https://github.com/peterjc/galaxy_blast + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use +the following command from the GitHub repository root folder:: + + $ ./ncbi_blast_plus/make_ncbi_blast_plus.sh + +This simplifies ensuring a consistent set of files is bundled each time, +including all the relevant test files. + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/blastxml_to_tabular.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/blastxml_to_tabular.py Tue Jul 30 07:33:46 2013 -0400 |
[ |
b'@@ -0,0 +1,261 @@\n+#!/usr/bin/env python\n+"""Convert a BLAST XML file to tabular output.\n+\n+Takes three command line options, input BLAST XML filename, output tabular\n+BLAST filename, output format (std for standard 12 columns, or ext for the\n+extended 24 columns offered in the BLAST+ wrappers).\n+\n+The 12 columns output are \'qseqid sseqid pident length mismatch gapopen qstart\n+qend sstart send evalue bitscore\' or \'std\' at the BLAST+ command line, which\n+mean:\n+ \n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The additional columns offered in the Galaxy BLAST+ wrappers are:\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+Most of these fields are given explicitly in the XML file, others some like\n+the percentage identity and the number of gap openings must be calculated.\n+\n+Be aware that the sequence in the extended tabular output or XML direct from\n+BLAST+ may or may not use XXXX masking on regions of low complexity. This\n+can throw the off the calculation of percentage identity and gap openings.\n+[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,\n+with these numbers changing depending on whether or not the low complexity\n+filter is used.]\n+\n+This script attempts to produce identical output to what BLAST+ would have done.\n+However, check this with "diff -b ..." since BLAST+ sometimes includes an extra\n+space character (probably a bug).\n+"""\n+import sys\n+import re\n+\n+if "-v" in sys.argv or "--version" in sys.argv:\n+ print "v0.0.12"\n+ sys.exit(0)\n+\n+if sys.version_info[:2] >= ( 2, 5 ):\n+ try:\n+ from xml.etree import cElementTree as ElementTree\n+ except ImportError:\n+ from xml.etree import ElementTree as ElementTree\n+else:\n+ from galaxy import eggs\n+ import pkg_resources; pkg_resources.require( "elementtree" )\n+ from elementtree import ElementTree\n+\n+def stop_err( msg ):\n+ sys.stderr.write("%s\\n" % msg)\n+ sys.exit(1)\n+\n+#Parse Command Line\n+try:\n+ in_file, out_file, out_fmt = sys.argv[1:]\n+except:\n+ stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")\n+\n+if out_fmt == "std":\n+ extended = False\n+elif out_fmt == "x22":\n+ stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")\n+elif out_fmt == "ext":\n+ extended = True\n+else:\n+ stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")\n+\n+\n+# get an iterable\n+try: \n+ context = ElementTree.iterparse(in_file, events=("start", "end")'..b'")\n+ xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")\n+ if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):\n+ stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \\\n+ % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),\n+ int(mismatch), expected_mismatch))\n+\n+ #TODO - Remove this alternative identity calculation and test\n+ #once satisifed there are no problems\n+ expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)\n+ if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):\n+ stop_err("%s vs %s identities, expected %i <= %i <= %i" \\\n+ % (qseqid, sseqid, expected_identity, int(nident),\n+ expected_identity + q_seq.count("X")))\n+ \n+\n+ evalue = hsp.findtext("Hsp_evalue")\n+ if evalue == "0":\n+ evalue = "0.0"\n+ else:\n+ evalue = "%0.0e" % float(evalue)\n+ \n+ bitscore = float(hsp.findtext("Hsp_bit-score"))\n+ if bitscore < 100:\n+ #Seems to show one decimal place for lower scores\n+ bitscore = "%0.1f" % bitscore\n+ else:\n+ #Note BLAST does not round to nearest int, it truncates\n+ bitscore = "%i" % bitscore\n+\n+ values = [qseqid,\n+ sseqid,\n+ pident,\n+ length, #hsp.findtext("Hsp_align-len")\n+ str(mismatch),\n+ gapopen,\n+ hsp.findtext("Hsp_query-from"), #qstart,\n+ hsp.findtext("Hsp_query-to"), #qend,\n+ hsp.findtext("Hsp_hit-from"), #sstart,\n+ hsp.findtext("Hsp_hit-to"), #send,\n+ evalue, #hsp.findtext("Hsp_evalue") in scientific notation\n+ bitscore, #hsp.findtext("Hsp_bit-score") rounded\n+ ]\n+\n+ if extended:\n+ sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))\n+ #print hit_def, "-->", sallseqid\n+ positive = hsp.findtext("Hsp_positive")\n+ ppos = "%0.2f" % (100*float(positive)/float(length))\n+ qframe = hsp.findtext("Hsp_query-frame")\n+ sframe = hsp.findtext("Hsp_hit-frame")\n+ if blast_program == "blastp":\n+ #Probably a bug in BLASTP that they use 0 or 1 depending on format\n+ if qframe == "0": qframe = "1"\n+ if sframe == "0": sframe = "1"\n+ slen = int(hit.findtext("Hit_len"))\n+ values.extend([sallseqid,\n+ hsp.findtext("Hsp_score"), #score,\n+ nident,\n+ positive,\n+ hsp.findtext("Hsp_gaps"), #gaps,\n+ ppos,\n+ qframe,\n+ sframe,\n+ #NOTE - for blastp, XML shows original seq, tabular uses XXX masking\n+ q_seq,\n+ h_seq,\n+ str(qlen),\n+ str(slen),\n+ ])\n+ #print "\\t".join(values) \n+ outfile.write("\\t".join(values) + "\\n")\n+ # prevents ElementTree from growing large datastructure\n+ root.clear()\n+ elem.clear()\n+outfile.close()\n' |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/blastxml_to_tabular.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/blastxml_to_tabular.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,137 @@ +<tool id="blastxml_to_tabular" name="BLAST XML to tabular" version="0.0.11"> + <description>Convert BLAST XML output to tabular</description> + <version_command interpreter="python">blastxml_to_tabular.py --version</version_command> + <command interpreter="python"> + blastxml_to_tabular.py $blastxml_file $tabular_file $out_format + </command> + <stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <param name="blastxml_file" type="data" format="blastxml" label="BLAST results as XML"/> + <param name="out_format" type="select" label="Output format"> + <option value="std">Tabular (standard 12 columns)</option> + <option value="ext" selected="True">Tabular (extended 24 columns)</option> + </param> + </inputs> + <outputs> + <data name="tabular_file" format="tabular" label="BLAST results as tabular" /> + </outputs> + <requirements> + </requirements> + <tests> + <test> + <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin.tabluar --> + <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin_22c.tabluar --> + <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted_ext.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_sample.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_sample_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastx output --> + <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space and XXXX masking differences from the actual blastx output --> + <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted_ext.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_sample.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastx output --> + <output name="tabular_file" file="blastx_sample_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_std.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_ext.tabular" ftype="tabular" /> + </test> + </tests> + <help> + +**What it does** + +NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of +formats including tabular and a more detailed XML format. A complex workflow +may need both the XML and the tabular output - but running BLAST twice is +slow and wasteful. + +This tool takes the BLAST XML output and can convert it into the +standard 12 column tabular equivalent: + +====== ========= ============================================ +Column NCBI name Description +------ --------- -------------------------------------------- + 1 qseqid Query Seq-id (ID of your sequence) + 2 sseqid Subject Seq-id (ID of the database hit) + 3 pident Percentage of identical matches + 4 length Alignment length + 5 mismatch Number of mismatches + 6 gapopen Number of gap openings + 7 qstart Start of alignment in query + 8 qend End of alignment in query + 9 sstart Start of alignment in subject (database hit) + 10 send End of alignment in subject (database hit) + 11 evalue Expectation value (E-value) + 12 bitscore Bit score +====== ========= ============================================ + +The BLAST+ tools can optionally output additional columns of information, +but this takes longer to calculate. Most (but not all) of these columns are +included by selecting the extended tabular output. The extra columns are +included *after* the standard 12 columns. This is so that you can write +workflow filtering steps that accept either the 12 or 22 column tabular +BLAST output. This tool now uses this extended 24 column output by default. + +====== ============= =========================================== +Column NCBI name Description +------ ------------- ------------------------------------------- + 13 sallseqid All subject Seq-id(s), separated by a ';' + 14 score Raw score + 15 nident Number of identical matches + 16 positive Number of positive-scoring matches + 17 gaps Total number of gaps + 18 ppos Percentage of positive-scoring matches + 19 qframe Query frame + 20 sframe Subject frame + 21 qseq Aligned part of query sequence + 22 sseq Aligned part of subject sequence + 23 qlen Query sequence length + 24 slen Subject sequence length +====== ============= =========================================== + +Beware that the XML file (and thus the conversion) and the tabular output +direct from BLAST+ may differ in the presence of XXXX masking on regions +low complexity (columns 21 and 22), and thus also calculated figures like +the percentage identity (column 3). + +**References** + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus + </help> +</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastdbcmd_info.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_blastdbcmd_info.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,67 @@ +<tool id="ncbi_blastdbcmd_info" name="NCBI BLAST+ database info" version="0.0.6"> + <description>Show BLAST database information from blastdbcmd</description> + <requirements> + <requirement type="binary">blastdbcmd</requirement> + <requirement type="package" version="2.2.26+">blast+</requirement> + </requirements> + <version_command>blastdbcmd -version</version_command> + <command> +blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" -info -out "$info" + </command> + <stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + <!-- Suspect blastdbcmd sometimes fails to set error level --> + <regex match="Error:" /> + <regex match="Exception:" /> + </stdio> + <inputs> + <conditional name="db_opts"> + <param name="db_type" type="select" label="Type of BLAST database"> + <option value="nucl" selected="True">Nucleotide</option> + <option value="prot">Protein</option> + </param> + <when value="nucl"> + <param name="database" type="select" label="Nucleotide BLAST database"> + <options from_file="blastdb.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </when> + <when value="prot"> + <param name="database" type="select" label="Protein BLAST database"> + <options from_file="blastdb_p.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </when> + </conditional> + </inputs> + <outputs> + <data name="info" format="txt" label="${db_opts.database.fields.name} info" /> + </outputs> + <help> + +**What it does** + +Calls the NCBI BLAST+ blastdbcmd command line tool with the -info +switch to give summary information about a BLAST database, such as +the size (number of sequences and total length) and date. + +------- + +**References** + +Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. + +Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005. + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus + </help> +</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
[ |
@@ -0,0 +1,139 @@ +<tool id="ncbi_blastdbcmd_wrapper" name="NCBI BLAST+ blastdbcmd entry(s)" version="0.0.6"> + <description>Extract sequence(s) from BLAST database</description> + <requirements> + <requirement type="binary">blastdbcmd</requirement> + <requirement type="package" version="2.2.26+">blast+</requirement> + </requirements> + <version_command>blastdbcmd -version</version_command> + <command> +## The command is a Cheetah template which allows some Python based syntax. +## Lines starting hash hash are comments. Galaxy will turn newlines into spaces +blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" + +##TODO: What about -ctrl_a and -target_only as advanced options? + +#if $id_opts.id_type=="file": +-entry_batch "$id_opts.entries" +#else: +##Perform some simple search/replaces to remove whitespace +##and make it comma separated, and escape any pipe characters +-entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',').replace('|','\|')" +#end if + +##When building a BLAST database, to ensure unique IDs makeblastdb will +##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44 +##(if using -parse_seqids) or simply assign it an ID using the record +##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA +##file). In -parse_seqids mode, a duplicate FASTA ID gives an error. +## +##The BLAST plain text and XML output will contain these BLAST IDs, but +##the tabular output does not (at least, not in BLAST 2.2.25+). +##Therefore in general, Galaxy users won't care about the (internal) +##BLAST identifiers. +## +##The blastdbcmd FASTA output will also contain these IDs, but in the +##context of the BLAST tabular output they are not helpful. Therefore +##to recover the original ID as used in the FASTA file for makeblastdb +##we need a litte post processing. +## +##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes +##using sed, however the exact syntax differs for Mac OS X's sed + +#if str($outfmt)=="blastid": +-out "$seq" +#else if sys.platform == "darwin": +| sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq" +#else: +| sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq" +#end if + </command> + <stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + <!-- Suspect blastdbcmd sometimes fails to set error level --> + <regex match="Error:" /> + <regex match="Exception:" /> + </stdio> + <inputs> + <conditional name="db_opts"> + <param name="db_type" type="select" label="Type of BLAST database"> + <option value="nucl" selected="True">Nucleotide</option> + <option value="prot">Protein</option> + </param> + <when value="nucl"> + <param name="database" type="select" label="Nucleotide BLAST database"> + <options from_file="blastdb.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </when> + <when value="prot"> + <param name="database" type="select" label="Protein BLAST database"> + <options from_file="blastdb_p.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </when> + </conditional> + <conditional name="id_opts"> + <param name="id_type" type="select" label="Type of identifier list"> + <option value="file">From file</option> + <option value="prompt">User entered</option> + </param> + <when value="file"> + <param name="entries" type="data" format="txt,tabular" label="Sequence identifier(s)" help="Plain text file with one ID per line (i.e. single column tabular file)"/> + </when> + <when value="prompt"> + <param name="entries" type="text" label="Sequence identifier(s)" help="Comma or new line separated list." optional="False" area="True" size="10x30"/> + </when> + </conditional> + <param name="outfmt" type="select" label="Output format"> + <option value="original">FASTA with original identifiers</option> + <option value="blastid">FASTA with BLAST assigned identifiers</option> + </param> + </inputs> + <outputs> + <data name="seq" format="fasta" label="Sequences from ${db_opts.database.fields.name}" /> + </outputs> + <help> + +**What it does** + +Extracts FASTA formatted sequences from a BLAST database +using the NCBI BLAST+ blastdbcmd command line tool. + +.. class:: warningmark + +**BLAST assigned identifiers** + +When a BLAST database is constructed from a FASTA file, the +original identifiers can be replaced with BLAST assigned +identifiers, partly to ensure uniqueness. e.g. Sometimes +a prefix of 'lcl|' is added (lcl is short for local), +or an arbitrary name starting 'gnl|BL_ORD_ID|' is created. + +If you are using the tabular output from BLAST, it will contain +the original identifiers - not the BLAST assigned identifiers +suitable for use with the blastdbcmd tool. + +If you are using the XML or plain text output, this will also +contain the BLAST assigned identifiers. However, this means +getting a list of BLAST assigned identifiers isn't straightforward. + +------- + +**References** + +Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. + +Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005. + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus + </help> +</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastn_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_blastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,257 @@\n+<tool id="ncbi_blastn_wrapper" name="NCBI BLAST+ blastn" version="0.0.20">\n+ <description>Search nucleotide database with nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">blastn</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>blastn -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastn\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-task $blast_type\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+$adv_opts.strand\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.identity_cutoff) and float(str($adv_opts.identity_cutoff)) > 0 ):\n+-perc_identity $adv_opts.identity_cutoff\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <option value="histdb">BLAST database from your history</option>\n+ <option value="file">FASTA file from your history (see warning note below)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="hidden" value="" /'..b"rk\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *nucleotide database* using a *nucleotide query*,\n+using the NCBI BLAST+ blastn command line tool.\n+Algorithms include blastn, megablast, and discontiguous megablast.\n+\n+.. class:: warningmark\n+\n+You can also search against a FASTA file of subject nucleotide\n+sequences. This is *not* advised because it is slower (only one\n+CPU is used), but more importantly gives e-values for pairwise\n+searches (very small e-values which will look overly signficiant).\n+In most cases you should instead turn the other FASTA file into a\n+database first using *makeblastdb* and search against that.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastp_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_blastp_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,308 @@\n+<tool id="ncbi_blastp_wrapper" name="NCBI BLAST+ blastp" version="0.0.20">\n+ <description>Search protein database with protein query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">blastp</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>blastp -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastp\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-task $blast_type\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+##Ungapped disabled for now - see comments below\n+##$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <option value="histdb">BLAST database from your history</option>\n+ <option value="file">FASTA file from your history (see warning note below)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein BLAST database">\n+ <options from_file="blastdb_p.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbp" label="Protein BLAST database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="data" format="fasta" label="Protein FASTA '..b"+\n+**What it does**\n+\n+Search a *protein database* using a *protein query*,\n+using the NCBI BLAST+ blastp command line tool.\n+\n+.. class:: warningmark\n+\n+You can also search against a FASTA file of subject protein\n+sequences. This is *not* advised because it is slower (only one\n+CPU is used), but more importantly gives e-values for pairwise\n+searches (very small e-values which will look overly signficiant).\n+In most cases you should instead turn the other FASTA file into a\n+database first using *makeblastdb* and search against that.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_blastx_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_blastx_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,294 @@\n+<tool id="ncbi_blastx_wrapper" name="NCBI BLAST+ blastx" version="0.0.19">\n+ <description>Search protein database with translated nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">blastx</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>blastx -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastx\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-query_gencode $query_gencode\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+$adv_opts.strand\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <option value="histdb">BLAST database from your history</option>\n+ <option value="file">FASTA file from your history (see warning note below)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein BLAST database">\n+ <options from_file="blastdb_p.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbp" label="Protein BLAST database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="data" format="fasta" label="Protein FASTA file to'..b"ingmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *protein database* using a *translated nucleotide query*,\n+using the NCBI BLAST+ blastx command line tool.\n+\n+.. class:: warningmark\n+\n+You can also search against a FASTA file of subject protein\n+sequences. This is *not* advised because it is slower (only one\n+CPU is used), but more importantly gives e-values for pairwise\n+searches (very small e-values which will look overly signficiant).\n+In most cases you should instead turn the other FASTA file into a\n+database first using *makeblastdb* and search against that.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length \n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_makeblastdb.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_makeblastdb.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,129 @@ +<tool id="ncbi_makeblastdb" name="NCBI BLAST+ makeblastdb" version="0.0.5"> + <description>Make BLAST database</description> + <requirements> + <requirement type="binary">makeblastdb</requirement> + <requirement type="package" version="2.2.26+">blast+</requirement> + </requirements> + <version_command>makeblastdb -version</version_command> + <command> +makeblastdb -out "${os.path.join($outfile.extra_files_path,'blastdb')}" +$parse_seqids +$hash_index +## Single call to -in with multiple filenames space separated with outer quotes +## (presumably any filenames with spaces would be a problem). Note this gives +## some extra spaces, e.g. -in " file1 file2 file3 " but BLAST seems happy: +-in " +#for $i in $in +${i.file} #end for +" +#if $title: +-title "$title" +#else: +##Would default to being based on the cryptic Galaxy filenames, which is unhelpful +-title "BLAST Database" +#end if +-dbtype $dbtype +## #set $sep = '-mask_data ' +## #for $i in $mask_data +## $sep${i.file} +## #set $set = ', ' +## #end for +## #set $sep = '-gi_mask -gi_mask_name ' +## #for $i in $gi_mask +## $sep${i.file} +## #set $set = ', ' +## #end for +## #if $tax.select == 'id': +## -taxid $tax.id +## #else if $tax.select == 'map': +## -taxid_map $tax.map +## #end if +</command> +<stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + <!-- In case the return code has not been set propery check stderr too --> + <regex match="Error:" /> + <regex match="Exception:" /> +</stdio> +<inputs> + <param name="dbtype" type="select" display="radio" label="Molecule type of input"> + <option value="prot">protein</option> + <option value="nucl">nucleotide</option> + </param> + <!-- TODO Allow merging of existing BLAST databases (conditional on the database type) + <repeat name="in" title="Blast or Fasta Database" min="1"> + <param name="file" type="data" format="fasta,blastdbn,blastdbp" label="Blast or Fasta database" /> + </repeat> + --> + <repeat name="in" title="FASTA file" min="1"> + <param name="file" type="data" format="fasta" /> + </repeat> + <param name="title" type="text" value="" label="Title for BLAST database" help="This is the database name shown in BLAST search output" /> + <param name="parse_seqids" type="boolean" truevalue="-parse_seqids" falsevalue="" checked="False" label="Parse the sequence identifiers" help="This is only advised if your FASTA file follows the NCBI naming conventions using pipe '|' symbols" /> + <param name="hash_index" type="boolean" truevalue="-hash_index" falsevalue="" checked="true" label="Enable the creation of sequence hash values." help="These hash values can then be used to quickly determine if a given sequence data exists in this BLAST database." /> + + <!-- SEQUENCE MASKING OPTIONS --> + <!-- TODO + <repeat name="mask_data" title="Provide one or more files containing masking data"> + <param name="file" type="data" format="asnb" label="File containing masking data" help="As produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)" /> + </repeat> + <repeat name="gi_mask" title="Create GI indexed masking data"> + <param name="file" type="data" format="asnb" label="Masking data output file" /> + </repeat> + --> + + <!-- TAXONOMY OPTIONS --> + <!-- TODO + <conditional name="tax"> + <param name="select" type="select" label="Taxonomy options"> + <option value="">Do not assign sequences to Taxonomy IDs</option> + <option value="id">Assign all sequences to one Taxonomy ID</option> + <option value="map">Supply text file mapping sequence IDs to taxnomy IDs</option> + </param> + <when value=""> + </when> + <when value="id"> + <param name="id" type="integer" value="" label="NCBI taxonomy ID" help="Integer >=0" /> + </when> + <when value="map"> + <param name="file" type="data" format="txt" label="Seq ID : Tax ID mapping file" help="Format: SequenceId TaxonomyId" /> + </when> + </conditional> + --> +</inputs> +<outputs> + <!-- If we only accepted one FASTA file, we could use its human name here... --> + <data name="outfile" format="data" label="${dbtype.value_label} BLAST database from ${on_string}"> + <change_format> + <when input="dbtype" value="nucl" format="blastdbn"/> + <when input="dbtype" value="prot" format="blastdbp"/> + </change_format> + </data> +</outputs> +<help> +**What it does** + +Make BLAST database from one or more FASTA files and/or BLAST databases. + +This is a wrapper for the NCBI BLAST+ tool 'makeblastdb', which is the +replacement for the 'formatdb' tool in the NCBI 'legacy' BLAST suite. + +<!-- +Applying masks to an existing BLAST database will not change the original database; a new database will be created. +For this reason, it's best to apply all masks at once to minimize the number of unnecessary intermediate databases. +--> + +**Documentation** + +http://www.ncbi.nlm.nih.gov/books/NBK1763/ + +**References** + +Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus +</help> +</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_rpsblast_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,238 @@\n+<tool id="ncbi_rpsblast_wrapper" name="NCBI BLAST+ rpsblast" version="0.0.4">\n+ <description>Search protein domain database (PSSMs) with protein query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">rpsblast</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>rpsblast -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+rpsblast\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#end if\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Protein domain database (PSSM)">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+\t <!-- TODO - define new datatype\n+ <option value="histdb">BLAST protein domain database from your history</option>\n+\t -->\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein domain database">\n+ <options from_file="blastdb_d.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+\t <!-- TODO - define new datatype\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbd" label="Protein domain database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+\t -->\n+ </conditional>\n+ <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n+ <param name="out_format" type="select" label="Output format">\n+ <option value="6">Tabular (standard 12 columns)</option>\n+ <option value="ext" selected="True">Tabular (extended 24 columns)</option>\n+'..b"agments classified in the COGs resource, which focuses primarily\n+on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/\n+\n+*Pfam* - PSSMs from Pfam-A seed alignment database, see\n+http://pfam.sanger.ac.uk/\n+\n+*Smart* - PSSMs from SMART domain alignment database, see\n+http://smart.embl-heidelberg.de/\n+\n+*Tigr* - PSSMs from TIGRFAM database of protein families, see\n+http://www.jcvi.org/cms/research/projects/tigrfams/overview/\n+\n+*Prk* - PSSms from automatically aligned stable clusters in the\n+Protein Clusters database, see\n+http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters\n+\n+The exact list of domain databases offered will depend on how your\n+local Galaxy has been configured.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,239 @@\n+<tool id="ncbi_rpstblastn_wrapper" name="NCBI BLAST+ rpstblastn" version="0.0.4">\n+ <description>Search protein domain database (PSSMs) with translated nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">rpstblastn</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>rpstblastn -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+rpstblastn\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#end if\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+##Seems rpstblastn does not currently support multiple threads :(\n+##-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Protein domain database (PSSM)">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <!-- TODO - define new datatype\n+ <option value="histdb">BLAST protein domain database from your history</option>\n+ -->\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein domain database">\n+ <options from_file="blastdb_d.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <!-- TODO - define new datatype\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbd" label="Protein domain database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ -->\n+ </conditional>\n+ <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n+ <param name="out_format" type="select" label="Output format">\n+ <option value="6">Tabul'..b"agments classified in the COGs resource, which focuses primarily\n+on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/\n+\n+*Pfam* - PSSMs from Pfam-A seed alignment database, see\n+http://pfam.sanger.ac.uk/\n+\n+*Smart* - PSSMs from SMART domain alignment database, see\n+http://smart.embl-heidelberg.de/\n+\n+*Tigr* - PSSMs from TIGRFAM database of protein families, see\n+http://www.jcvi.org/cms/research/projects/tigrfams/overview/\n+\n+*Prk* - PSSms from automatically aligned stable clusters in the\n+Protein Clusters database, see\n+http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters\n+\n+The exact list of domain databases offered will depend on how your\n+local Galaxy has been configured.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_tblastn_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_tblastn_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,340 @@\n+<tool id="ncbi_tblastn_wrapper" name="NCBI BLAST+ tblastn" version="0.0.20">\n+ <description>Search translated nucleotide database with protein query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">tblastn</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>tblastn -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+tblastn\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+-db_gencode $adv_opts.db_gencode\n+$adv_opts.filter_query\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+##Ungapped disabled for now - see comments below\n+##$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <option value="histdb">BLAST database from your history</option>\n+ <option value="file">FASTA file from your history (see warning note below)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="data" '..b"mark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *translated nucleotide database* using a *protein query*,\n+using the NCBI BLAST+ tblastn command line tool.\n+\n+.. class:: warningmark\n+\n+You can also search against a FASTA file of subject nucleotide\n+sequences. This is *not* advised because it is slower (only one\n+CPU is used), but more importantly gives e-values for pairwise\n+searches (very small e-values which will look overly signficiant).\n+In most cases you should instead turn the other FASTA file into a\n+database first using *makeblastdb* and search against that.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/ncbi_tblastx_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/ncbi_tblastx_wrapper.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
b'@@ -0,0 +1,294 @@\n+<tool id="ncbi_tblastx_wrapper" name="NCBI BLAST+ tblastx" version="0.0.20">\n+ <description>Search translated nucleotide database with translated nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n+ <requirements>\n+ <requirement type="binary">tblastx</requirement>\n+ <requirement type="package" version="2.2.26+">blast+</requirement>\n+ </requirements>\n+ <version_command>tblastx -version</version_command>\n+ <command>\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+tblastx\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#elif $db_opts.db_opts_selector == "histdb":\n+ -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-query_gencode $query_gencode\n+-evalue $evalue_cutoff\n+-out "$output1"\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+-db_gencode $adv_opts.db_gencode\n+$adv_opts.filter_query\n+$adv_opts.strand\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <stdio>\n+ <!-- Anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ <!-- In case the return code has not been set propery check stderr too -->\n+ <regex match="Error:" />\n+ <regex match="Exception:" />\n+ </stdio>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">Locally installed BLAST database</option>\n+ <option value="histdb">BLAST database from your history</option>\n+ <option value="file">FASTA file from your history (see warning note below)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="histdb">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n+ <param name="subject" type="hidden" value="" />\n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" />\n+ <param name="histdb" type="hidden" value="" />\n+ <param name="subject" type="data" format'..b"mark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *translated nucleotide database* using a *protein query*,\n+using the NCBI BLAST+ tblastx command line tool.\n+\n+.. class:: warningmark\n+\n+You can also search against a FASTA file of subject nucleotide\n+sequences. This is *not* advised because it is slower (only one\n+CPU is used), but more importantly gives e-values for pairwise\n+searches (very small e-values which will look overly signficiant).\n+In most cases you should instead turn the other FASTA file into a\n+database first using *makeblastdb* and search against that.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output. Galaxy now uses this extended 24 column output by default.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a ';'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n+ </help>\n+</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/repository_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/repository_dependencies.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,5 @@ +<?xml version="1.0"?> +<repositories description="This requires the BLAST datatype definitions (e.g. the BLAST XML format)."> +<!-- Revision 4:f9a7783ed7b6 on the main (and test) tool shed is v0.0.14 which added BLAST databases --> +<repository changeset_revision="f9a7783ed7b6" name="blast_datatypes" owner="devteam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> +</repositories> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a ncbi_blast_plus/tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ncbi_blast_plus/tool_dependencies.xml Tue Jul 30 07:33:46 2013 -0400 |
b |
@@ -0,0 +1,20 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="blast+" version="2.2.26+"> + <install version="1.0"> + <actions> + <action type="download_by_url">ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz</action> + <action type="shell_command">cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install</action> + <action type="set_environment"> + <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable> + </action> + </actions> + </install> + <readme> +Downloads and compiles BLAST+ from the NCBI, which assumes you have +all the required build dependencies installed. See: +http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download + </readme> + </package> +</tool_dependency> + |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/blastxml_to_tabular.py --- a/tools/ncbi_blast_plus/blastxml_to_tabular.py Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
b'@@ -1,261 +0,0 @@\n-#!/usr/bin/env python\n-"""Convert a BLAST XML file to tabular output.\n-\n-Takes three command line options, input BLAST XML filename, output tabular\n-BLAST filename, output format (std for standard 12 columns, or ext for the\n-extended 24 columns offered in the BLAST+ wrappers).\n-\n-The 12 columns output are \'qseqid sseqid pident length mismatch gapopen qstart\n-qend sstart send evalue bitscore\' or \'std\' at the BLAST+ command line, which\n-mean:\n- \n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The additional columns offered in the Galaxy BLAST+ wrappers are:\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-Most of these fields are given explicitly in the XML file, others some like\n-the percentage identity and the number of gap openings must be calculated.\n-\n-Be aware that the sequence in the extended tabular output or XML direct from\n-BLAST+ may or may not use XXXX masking on regions of low complexity. This\n-can throw the off the calculation of percentage identity and gap openings.\n-[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,\n-with these numbers changing depending on whether or not the low complexity\n-filter is used.]\n-\n-This script attempts to produce identical output to what BLAST+ would have done.\n-However, check this with "diff -b ..." since BLAST+ sometimes includes an extra\n-space character (probably a bug).\n-"""\n-import sys\n-import re\n-\n-if "-v" in sys.argv or "--version" in sys.argv:\n- print "v0.0.12"\n- sys.exit(0)\n-\n-if sys.version_info[:2] >= ( 2, 5 ):\n- try:\n- from xml.etree import cElementTree as ElementTree\n- except ImportError:\n- from xml.etree import ElementTree as ElementTree\n-else:\n- from galaxy import eggs\n- import pkg_resources; pkg_resources.require( "elementtree" )\n- from elementtree import ElementTree\n-\n-def stop_err( msg ):\n- sys.stderr.write("%s\\n" % msg)\n- sys.exit(1)\n-\n-#Parse Command Line\n-try:\n- in_file, out_file, out_fmt = sys.argv[1:]\n-except:\n- stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")\n-\n-if out_fmt == "std":\n- extended = False\n-elif out_fmt == "x22":\n- stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")\n-elif out_fmt == "ext":\n- extended = True\n-else:\n- stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")\n-\n-\n-# get an iterable\n-try: \n- context = ElementTree.iterparse(in_file, events=("start", "end")'..b'")\n- xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")\n- if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):\n- stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \\\n- % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),\n- int(mismatch), expected_mismatch))\n-\n- #TODO - Remove this alternative identity calculation and test\n- #once satisifed there are no problems\n- expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)\n- if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):\n- stop_err("%s vs %s identities, expected %i <= %i <= %i" \\\n- % (qseqid, sseqid, expected_identity, int(nident),\n- expected_identity + q_seq.count("X")))\n- \n-\n- evalue = hsp.findtext("Hsp_evalue")\n- if evalue == "0":\n- evalue = "0.0"\n- else:\n- evalue = "%0.0e" % float(evalue)\n- \n- bitscore = float(hsp.findtext("Hsp_bit-score"))\n- if bitscore < 100:\n- #Seems to show one decimal place for lower scores\n- bitscore = "%0.1f" % bitscore\n- else:\n- #Note BLAST does not round to nearest int, it truncates\n- bitscore = "%i" % bitscore\n-\n- values = [qseqid,\n- sseqid,\n- pident,\n- length, #hsp.findtext("Hsp_align-len")\n- str(mismatch),\n- gapopen,\n- hsp.findtext("Hsp_query-from"), #qstart,\n- hsp.findtext("Hsp_query-to"), #qend,\n- hsp.findtext("Hsp_hit-from"), #sstart,\n- hsp.findtext("Hsp_hit-to"), #send,\n- evalue, #hsp.findtext("Hsp_evalue") in scientific notation\n- bitscore, #hsp.findtext("Hsp_bit-score") rounded\n- ]\n-\n- if extended:\n- sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))\n- #print hit_def, "-->", sallseqid\n- positive = hsp.findtext("Hsp_positive")\n- ppos = "%0.2f" % (100*float(positive)/float(length))\n- qframe = hsp.findtext("Hsp_query-frame")\n- sframe = hsp.findtext("Hsp_hit-frame")\n- if blast_program == "blastp":\n- #Probably a bug in BLASTP that they use 0 or 1 depending on format\n- if qframe == "0": qframe = "1"\n- if sframe == "0": sframe = "1"\n- slen = int(hit.findtext("Hit_len"))\n- values.extend([sallseqid,\n- hsp.findtext("Hsp_score"), #score,\n- nident,\n- positive,\n- hsp.findtext("Hsp_gaps"), #gaps,\n- ppos,\n- qframe,\n- sframe,\n- #NOTE - for blastp, XML shows original seq, tabular uses XXX masking\n- q_seq,\n- h_seq,\n- str(qlen),\n- str(slen),\n- ])\n- #print "\\t".join(values) \n- outfile.write("\\t".join(values) + "\\n")\n- # prevents ElementTree from growing large datastructure\n- root.clear()\n- elem.clear()\n-outfile.close()\n' |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/blastxml_to_tabular.xml --- a/tools/ncbi_blast_plus/blastxml_to_tabular.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,137 +0,0 @@ -<tool id="blastxml_to_tabular" name="BLAST XML to tabular" version="0.0.11"> - <description>Convert BLAST XML output to tabular</description> - <version_command interpreter="python">blastxml_to_tabular.py --version</version_command> - <command interpreter="python"> - blastxml_to_tabular.py $blastxml_file $tabular_file $out_format - </command> - <stdio> - <!-- Anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - </stdio> - <inputs> - <param name="blastxml_file" type="data" format="blastxml" label="BLAST results as XML"/> - <param name="out_format" type="select" label="Output format"> - <option value="std">Tabular (standard 12 columns)</option> - <option value="ext" selected="True">Tabular (extended 24 columns)</option> - </param> - </inputs> - <outputs> - <data name="tabular_file" format="tabular" label="BLAST results as tabular" /> - </outputs> - <requirements> - </requirements> - <tests> - <test> - <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin.tabluar --> - <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin_22c.tabluar --> - <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted_ext.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_sample.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_sample_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastx output --> - <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space and XXXX masking differences from the actual blastx output --> - <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted_ext.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_sample.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastx output --> - <output name="tabular_file" file="blastx_sample_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_std.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_ext.tabular" ftype="tabular" /> - </test> - </tests> - <help> - -**What it does** - -NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of -formats including tabular and a more detailed XML format. A complex workflow -may need both the XML and the tabular output - but running BLAST twice is -slow and wasteful. - -This tool takes the BLAST XML output and can convert it into the -standard 12 column tabular equivalent: - -====== ========= ============================================ -Column NCBI name Description ------- --------- -------------------------------------------- - 1 qseqid Query Seq-id (ID of your sequence) - 2 sseqid Subject Seq-id (ID of the database hit) - 3 pident Percentage of identical matches - 4 length Alignment length - 5 mismatch Number of mismatches - 6 gapopen Number of gap openings - 7 qstart Start of alignment in query - 8 qend End of alignment in query - 9 sstart Start of alignment in subject (database hit) - 10 send End of alignment in subject (database hit) - 11 evalue Expectation value (E-value) - 12 bitscore Bit score -====== ========= ============================================ - -The BLAST+ tools can optionally output additional columns of information, -but this takes longer to calculate. Most (but not all) of these columns are -included by selecting the extended tabular output. The extra columns are -included *after* the standard 12 columns. This is so that you can write -workflow filtering steps that accept either the 12 or 22 column tabular -BLAST output. This tool now uses this extended 24 column output by default. - -====== ============= =========================================== -Column NCBI name Description ------- ------------- ------------------------------------------- - 13 sallseqid All subject Seq-id(s), separated by a ';' - 14 score Raw score - 15 nident Number of identical matches - 16 positive Number of positive-scoring matches - 17 gaps Total number of gaps - 18 ppos Percentage of positive-scoring matches - 19 qframe Query frame - 20 sframe Subject frame - 21 qseq Aligned part of query sequence - 22 sseq Aligned part of subject sequence - 23 qlen Query sequence length - 24 slen Subject sequence length -====== ============= =========================================== - -Beware that the XML file (and thus the conversion) and the tabular output -direct from BLAST+ may differ in the presence of XXXX masking on regions -low complexity (columns 21 and 22), and thus also calculated figures like -the percentage identity (column 3). - -**References** - -This wrapper is available to install into other Galaxy Instances via the Galaxy -Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus - </help> -</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blast_plus.txt --- a/tools/ncbi_blast_plus/ncbi_blast_plus.txt Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,151 +0,0 @@ -Galaxy wrappers for NCBI BLAST+ suite -===================================== - -These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+), -and does not work with the NCBI 'legacy' BLAST suite (e.g. blastall). - -Note that these wrappers (and the associated datatypes) were originally -distributed as part of the main Galaxy repository, but as of August 2012 -moved to the Galaxy Tool Shed as 'ncbi_blast_plus' (and 'blast_datatypes'). -My thanks to Dannon Baker from the Galaxy development team for his assistance -with this. - -These wrappers are available from the Galaxy Tool Shed at: -http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus - - -Automated Installation -====================== - -Galaxy should be able to automatically install the dependencies, i.e. the -'blast_datatypes' repository which defines the BLAST XML file format -('blastxml') and protein and nucleotide BLAST databases ('blastdbp' and -'blastdbn'). - -You must tell Galaxy about any system level BLAST databases using configuration -files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein -databases like NR), and blastdb_d.loc (protein domain databases like CDD or -SMART) which are located in the tool-data/ folder. Sample files are included -which explain the tab-based format to use. - -You can download the NCBI provided databases as tar-balls from here: -ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR) -ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD) - - -Manual Installation -=================== - -For those not using Galaxy's automated installation from the Tool Shed, put -the XML and Python files in the tools/ncbi_blast_plus/ folder and add the XML -files to your tool_conf.xml as normal (and do the same in tool_conf.xml.sample -in order to run the unit tests). For example, use: - - <section name="NCBI BLAST+" id="ncbi_blast_plus_tools"> - <tool file="ncbi_blast_plus/ncbi_blastn_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_blastp_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_blastx_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_tblastn_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_tblastx_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_makeblastdb.xml" /> - <tool file="ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_blastdbcmd_info.xml" /> - <tool file="ncbi_blast_plus/ncbi_rpsblast_wrapper.xml" /> - <tool file="ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml" /> - <tool file="ncbi_blast_plus/blastxml_to_tabular.xml" /> - </section> - -You will also need to install 'blast_datatypes' from the Tool Shed. This -defines the BLAST XML file format ('blastxml') and protein and nucleotide -BLAST databases composite file formats ('blastdbp' and 'blastdbn'). - -As described above for an automated installation, you must also tell Galaxy -about any system level BLAST databases using the tool-data/blastdb*.loc files. - -You must install the NCBI BLAST+ standalone tools somewhere on the system -path. Currently the unit tests are written using "BLAST 2.2.26+". - -Run the functional tests (adjusting the section identifier to match your -tool_conf.xml.sample file): - -./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools - - -History -======= - -v0.0.11 - Final revision as part of the Galaxy main repository, and the - first release via the Tool Shed -v0.0.12 - Implements genetic code option for translation searches. - - Changes <parallelism> to 1000 sequences at a time (to cope with - very large sets of queries where BLAST+ can become memory hungry) - - Include warning that BLAST+ with subject FASTA gives pairwise - e-values -v0.0.13 - Use the new error handling options in Galaxy (the previously - bundled hide_stderr.py script is no longer needed). -v0.0.14 - Support for makeblastdb and blastdbinfo with local BLAST databases - in the history (using work from Edward Kirton), requires v0.0.14 - of the 'blast_datatypes' repository from the Tool Shed. -v0.0.15 - Stronger warning in help text against searching against subject - FASTA files (better looking e-values than you might be expecting). -v0.0.16 - Added repository_dependencies.xml for automates installation of the - 'blast_datatypes' repository from the Tool Shed. -v0.0.17 - The BLAST+ search tools now default to extended tabular output - (all too often our users where having to re-run searches just to - get one of the missing columns like query or subject length) -v0.0.18 - Defensive quoting of filenames in case of spaces (where possible, - BLAST+ handling of some mult-file arguments is problematic). -v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc - for the domain databases they use (e.g. CDD, PFAM or SMART). - - Correct case of exception regular expression (for error handling - fall-back in case the return code is not set properly). - - Clearer naming of output files. -v0.0.20 - Added unit tests for BLASTN and TBLASTX. - - Fallback on ElementTree if cElementTree missing in XML to tabular. - - Link to Tool Shed added to help text and this documentation. - - Tweak dependency on blast_datatypes to also work on Test Tool Shed - - -Developers -========== - -This script and related tools are being developed on the 'tools' branch of the -following Mercurial repository: -https://bitbucket.org/peterjc/galaxy-central/ - -For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use -the following command from the Galaxy root folder: - -$ ./tools/ncbi_blast_plus/make_ncbi_blast_plus.sh - -This simplifies ensuring a consistent set of files is bundled each time, -including all the relevant test files. - - -Licence (MIT/BSD style) -======================= - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. - -NOTE: This is the licence for the Galaxy Wrapper only. NCBI BLAST+ and -associated data files are available and licenced separately. |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml --- a/tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,67 +0,0 @@ -<tool id="ncbi_blastdbcmd_info" name="NCBI BLAST+ database info" version="0.0.6"> - <description>Show BLAST database information from blastdbcmd</description> - <requirements> - <requirement type="binary">blastdbcmd</requirement> - <requirement type="package" version="2.2.26+">blast+</requirement> - </requirements> - <version_command>blastdbcmd -version</version_command> - <command> -blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" -info -out "$info" - </command> - <stdio> - <!-- Anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - <!-- Suspect blastdbcmd sometimes fails to set error level --> - <regex match="Error:" /> - <regex match="Exception:" /> - </stdio> - <inputs> - <conditional name="db_opts"> - <param name="db_type" type="select" label="Type of BLAST database"> - <option value="nucl" selected="True">Nucleotide</option> - <option value="prot">Protein</option> - </param> - <when value="nucl"> - <param name="database" type="select" label="Nucleotide BLAST database"> - <options from_file="blastdb.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> - </when> - <when value="prot"> - <param name="database" type="select" label="Protein BLAST database"> - <options from_file="blastdb_p.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> - </when> - </conditional> - </inputs> - <outputs> - <data name="info" format="txt" label="${db_opts.database.fields.name} info" /> - </outputs> - <help> - -**What it does** - -Calls the NCBI BLAST+ blastdbcmd command line tool with the -info -switch to give summary information about a BLAST database, such as -the size (number of sequences and total length) and date. - -------- - -**References** - -Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. - -Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005. - -This wrapper is available to install into other Galaxy Instances via the Galaxy -Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus - </help> -</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,139 +0,0 @@ -<tool id="ncbi_blastdbcmd_wrapper" name="NCBI BLAST+ blastdbcmd entry(s)" version="0.0.6"> - <description>Extract sequence(s) from BLAST database</description> - <requirements> - <requirement type="binary">blastdbcmd</requirement> - <requirement type="package" version="2.2.26+">blast+</requirement> - </requirements> - <version_command>blastdbcmd -version</version_command> - <command> -## The command is a Cheetah template which allows some Python based syntax. -## Lines starting hash hash are comments. Galaxy will turn newlines into spaces -blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" - -##TODO: What about -ctrl_a and -target_only as advanced options? - -#if $id_opts.id_type=="file": --entry_batch "$id_opts.entries" -#else: -##Perform some simple search/replaces to remove whitespace -##and make it comma separated, and escape any pipe characters --entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',').replace('|','\|')" -#end if - -##When building a BLAST database, to ensure unique IDs makeblastdb will -##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44 -##(if using -parse_seqids) or simply assign it an ID using the record -##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA -##file). In -parse_seqids mode, a duplicate FASTA ID gives an error. -## -##The BLAST plain text and XML output will contain these BLAST IDs, but -##the tabular output does not (at least, not in BLAST 2.2.25+). -##Therefore in general, Galaxy users won't care about the (internal) -##BLAST identifiers. -## -##The blastdbcmd FASTA output will also contain these IDs, but in the -##context of the BLAST tabular output they are not helpful. Therefore -##to recover the original ID as used in the FASTA file for makeblastdb -##we need a litte post processing. -## -##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes -##using sed, however the exact syntax differs for Mac OS X's sed - -#if str($outfmt)=="blastid": --out "$seq" -#else if sys.platform == "darwin": -| sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq" -#else: -| sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq" -#end if - </command> - <stdio> - <!-- Anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - <!-- Suspect blastdbcmd sometimes fails to set error level --> - <regex match="Error:" /> - <regex match="Exception:" /> - </stdio> - <inputs> - <conditional name="db_opts"> - <param name="db_type" type="select" label="Type of BLAST database"> - <option value="nucl" selected="True">Nucleotide</option> - <option value="prot">Protein</option> - </param> - <when value="nucl"> - <param name="database" type="select" label="Nucleotide BLAST database"> - <options from_file="blastdb.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> - </when> - <when value="prot"> - <param name="database" type="select" label="Protein BLAST database"> - <options from_file="blastdb_p.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> - </when> - </conditional> - <conditional name="id_opts"> - <param name="id_type" type="select" label="Type of identifier list"> - <option value="file">From file</option> - <option value="prompt">User entered</option> - </param> - <when value="file"> - <param name="entries" type="data" format="txt,tabular" label="Sequence identifier(s)" help="Plain text file with one ID per line (i.e. single column tabular file)"/> - </when> - <when value="prompt"> - <param name="entries" type="text" label="Sequence identifier(s)" help="Comma or new line separated list." optional="False" area="True" size="10x30"/> - </when> - </conditional> - <param name="outfmt" type="select" label="Output format"> - <option value="original">FASTA with original identifiers</option> - <option value="blastid">FASTA with BLAST assigned identifiers</option> - </param> - </inputs> - <outputs> - <data name="seq" format="fasta" label="Sequences from ${db_opts.database.fields.name}" /> - </outputs> - <help> - -**What it does** - -Extracts FASTA formatted sequences from a BLAST database -using the NCBI BLAST+ blastdbcmd command line tool. - -.. class:: warningmark - -**BLAST assigned identifiers** - -When a BLAST database is constructed from a FASTA file, the -original identifiers can be replaced with BLAST assigned -identifiers, partly to ensure uniqueness. e.g. Sometimes -a prefix of 'lcl|' is added (lcl is short for local), -or an arbitrary name starting 'gnl|BL_ORD_ID|' is created. - -If you are using the tabular output from BLAST, it will contain -the original identifiers - not the BLAST assigned identifiers -suitable for use with the blastdbcmd tool. - -If you are using the XML or plain text output, this will also -contain the BLAST assigned identifiers. However, this means -getting a list of BLAST assigned identifiers isn't straightforward. - -------- - -**References** - -Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. - -Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005. - -This wrapper is available to install into other Galaxy Instances via the Galaxy -Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus - </help> -</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,253 +0,0 @@\n-<tool id="ncbi_blastn_wrapper" name="NCBI BLAST+ blastn" version="0.0.20">\n- <description>Search nucleotide database with nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">blastn</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>blastn -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastn\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--task $blast_type\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-$adv_opts.strand\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <option value="histdb">BLAST database from your history</option>\n- <option value="file">FASTA file from your history (see warning note below)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n- </wh'..b"rk\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *nucleotide database* using a *nucleotide query*,\n-using the NCBI BLAST+ blastn command line tool.\n-Algorithms include blastn, megablast, and discontiguous megablast.\n-\n-.. class:: warningmark\n-\n-You can also search against a FASTA file of subject nucleotide\n-sequences. This is *not* advised because it is slower (only one\n-CPU is used), but more importantly gives e-values for pairwise\n-searches (very small e-values which will look overly signficiant).\n-In most cases you should instead turn the other FASTA file into a\n-database first using *makeblastdb* and search against that.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,308 +0,0 @@\n-<tool id="ncbi_blastp_wrapper" name="NCBI BLAST+ blastp" version="0.0.20">\n- <description>Search protein database with protein query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">blastp</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>blastp -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastp\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--task $blast_type\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-##Ungapped disabled for now - see comments below\n-##$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <option value="histdb">BLAST database from your history</option>\n- <option value="file">FASTA file from your history (see warning note below)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein BLAST database">\n- <options from_file="blastdb_p.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbp" label="Protein BLAST database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="data" format="fasta" label="Protein FASTA '..b"-\n-**What it does**\n-\n-Search a *protein database* using a *protein query*,\n-using the NCBI BLAST+ blastp command line tool.\n-\n-.. class:: warningmark\n-\n-You can also search against a FASTA file of subject protein\n-sequences. This is *not* advised because it is slower (only one\n-CPU is used), but more importantly gives e-values for pairwise\n-searches (very small e-values which will look overly signficiant).\n-In most cases you should instead turn the other FASTA file into a\n-database first using *makeblastdb* and search against that.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n-Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,294 +0,0 @@\n-<tool id="ncbi_blastx_wrapper" name="NCBI BLAST+ blastx" version="0.0.19">\n- <description>Search protein database with translated nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">blastx</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>blastx -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastx\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--query_gencode $query_gencode\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-$adv_opts.strand\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <option value="histdb">BLAST database from your history</option>\n- <option value="file">FASTA file from your history (see warning note below)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein BLAST database">\n- <options from_file="blastdb_p.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbp" label="Protein BLAST database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="data" format="fasta" label="Protein FASTA file to'..b"ingmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *protein database* using a *translated nucleotide query*,\n-using the NCBI BLAST+ blastx command line tool.\n-\n-.. class:: warningmark\n-\n-You can also search against a FASTA file of subject protein\n-sequences. This is *not* advised because it is slower (only one\n-CPU is used), but more importantly gives e-values for pairwise\n-searches (very small e-values which will look overly signficiant).\n-In most cases you should instead turn the other FASTA file into a\n-database first using *makeblastdb* and search against that.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length \n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_makeblastdb.xml --- a/tools/ncbi_blast_plus/ncbi_makeblastdb.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,129 +0,0 @@ -<tool id="ncbi_makeblastdb" name="NCBI BLAST+ makeblastdb" version="0.0.5"> - <description>Make BLAST database</description> - <requirements> - <requirement type="binary">makeblastdb</requirement> - <requirement type="package" version="2.2.26+">blast+</requirement> - </requirements> - <version_command>makeblastdb -version</version_command> - <command> -makeblastdb -out "${os.path.join($outfile.extra_files_path,'blastdb')}" -$parse_seqids -$hash_index -## Single call to -in with multiple filenames space separated with outer quotes -## (presumably any filenames with spaces would be a problem). Note this gives -## some extra spaces, e.g. -in " file1 file2 file3 " but BLAST seems happy: --in " -#for $i in $in -${i.file} #end for -" -#if $title: --title "$title" -#else: -##Would default to being based on the cryptic Galaxy filenames, which is unhelpful --title "BLAST Database" -#end if --dbtype $dbtype -## #set $sep = '-mask_data ' -## #for $i in $mask_data -## $sep${i.file} -## #set $set = ', ' -## #end for -## #set $sep = '-gi_mask -gi_mask_name ' -## #for $i in $gi_mask -## $sep${i.file} -## #set $set = ', ' -## #end for -## #if $tax.select == 'id': -## -taxid $tax.id -## #else if $tax.select == 'map': -## -taxid_map $tax.map -## #end if -</command> -<stdio> - <!-- Anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - <!-- In case the return code has not been set propery check stderr too --> - <regex match="Error:" /> - <regex match="Exception:" /> -</stdio> -<inputs> - <param name="dbtype" type="select" display="radio" label="Molecule type of input"> - <option value="prot">protein</option> - <option value="nucl">nucleotide</option> - </param> - <!-- TODO Allow merging of existing BLAST databases (conditional on the database type) - <repeat name="in" title="Blast or Fasta Database" min="1"> - <param name="file" type="data" format="fasta,blastdbn,blastdbp" label="Blast or Fasta database" /> - </repeat> - --> - <repeat name="in" title="FASTA file" min="1"> - <param name="file" type="data" format="fasta" /> - </repeat> - <param name="title" type="text" value="" label="Title for BLAST database" help="This is the database name shown in BLAST search output" /> - <param name="parse_seqids" type="boolean" truevalue="-parse_seqids" falsevalue="" checked="False" label="Parse the sequence identifiers" help="This is only advised if your FASTA file follows the NCBI naming conventions using pipe '|' symbols" /> - <param name="hash_index" type="boolean" truevalue="-hash_index" falsevalue="" checked="true" label="Enable the creation of sequence hash values." help="These hash values can then be used to quickly determine if a given sequence data exists in this BLAST database." /> - - <!-- SEQUENCE MASKING OPTIONS --> - <!-- TODO - <repeat name="mask_data" title="Provide one or more files containing masking data"> - <param name="file" type="data" format="asnb" label="File containing masking data" help="As produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)" /> - </repeat> - <repeat name="gi_mask" title="Create GI indexed masking data"> - <param name="file" type="data" format="asnb" label="Masking data output file" /> - </repeat> - --> - - <!-- TAXONOMY OPTIONS --> - <!-- TODO - <conditional name="tax"> - <param name="select" type="select" label="Taxonomy options"> - <option value="">Do not assign sequences to Taxonomy IDs</option> - <option value="id">Assign all sequences to one Taxonomy ID</option> - <option value="map">Supply text file mapping sequence IDs to taxnomy IDs</option> - </param> - <when value=""> - </when> - <when value="id"> - <param name="id" type="integer" value="" label="NCBI taxonomy ID" help="Integer >=0" /> - </when> - <when value="map"> - <param name="file" type="data" format="txt" label="Seq ID : Tax ID mapping file" help="Format: SequenceId TaxonomyId" /> - </when> - </conditional> - --> -</inputs> -<outputs> - <!-- If we only accepted one FASTA file, we could use its human name here... --> - <data name="outfile" format="data" label="${dbtype.value_label} BLAST database from ${on_string}"> - <change_format> - <when input="dbtype" value="nucl" format="blastdbn"/> - <when input="dbtype" value="prot" format="blastdbp"/> - </change_format> - </data> -</outputs> -<help> -**What it does** - -Make BLAST database from one or more FASTA files and/or BLAST databases. - -This is a wrapper for the NCBI BLAST+ tool 'makeblastdb', which is the -replacement for the 'formatdb' tool in the NCBI 'legacy' BLAST suite. - -<!-- -Applying masks to an existing BLAST database will not change the original database; a new database will be created. -For this reason, it's best to apply all masks at once to minimize the number of unnecessary intermediate databases. ---> - -**Documentation** - -http://www.ncbi.nlm.nih.gov/books/NBK1763/ - -**References** - -Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. - -This wrapper is available to install into other Galaxy Instances via the Galaxy -Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus -</help> -</tool> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_rpsblast_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,238 +0,0 @@\n-<tool id="ncbi_rpsblast_wrapper" name="NCBI BLAST+ rpsblast" version="0.0.4">\n- <description>Search protein domain database (PSSMs) with protein query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">rpsblast</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>rpsblast -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-rpsblast\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#end if\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Protein domain database (PSSM)">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n-\t <!-- TODO - define new datatype\n- <option value="histdb">BLAST protein domain database from your history</option>\n-\t -->\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein domain database">\n- <options from_file="blastdb_d.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" /> \n- </when>\n-\t <!-- TODO - define new datatype\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbd" label="Protein domain database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n-\t -->\n- </conditional>\n- <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n- <param name="out_format" type="select" label="Output format">\n- <option value="6">Tabular (standard 12 columns)</option>\n- <option value="ext" selected="True">Tabular (extended 24 columns)</option>\n-'..b"agments classified in the COGs resource, which focuses primarily\n-on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/\n-\n-*Pfam* - PSSMs from Pfam-A seed alignment database, see\n-http://pfam.sanger.ac.uk/\n-\n-*Smart* - PSSMs from SMART domain alignment database, see\n-http://smart.embl-heidelberg.de/\n-\n-*Tigr* - PSSMs from TIGRFAM database of protein families, see\n-http://www.jcvi.org/cms/research/projects/tigrfams/overview/\n-\n-*Prk* - PSSms from automatically aligned stable clusters in the\n-Protein Clusters database, see\n-http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters\n-\n-The exact list of domain databases offered will depend on how your\n-local Galaxy has been configured.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_rpstblastn_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,239 +0,0 @@\n-<tool id="ncbi_rpstblastn_wrapper" name="NCBI BLAST+ rpstblastn" version="0.0.4">\n- <description>Search protein domain database (PSSMs) with translated nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">rpstblastn</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>rpstblastn -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-rpstblastn\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#end if\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n-##Seems rpstblastn does not currently support multiple threads :(\n-##-num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Protein domain database (PSSM)">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <!-- TODO - define new datatype\n- <option value="histdb">BLAST protein domain database from your history</option>\n- -->\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein domain database">\n- <options from_file="blastdb_d.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <!-- TODO - define new datatype\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbd" label="Protein domain database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- -->\n- </conditional>\n- <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n- <param name="out_format" type="select" label="Output format">\n- <option value="6">Tabul'..b"agments classified in the COGs resource, which focuses primarily\n-on prokaryotes, see http://www.ncbi.nlm.nih.gov/COG/new/\n-\n-*Pfam* - PSSMs from Pfam-A seed alignment database, see\n-http://pfam.sanger.ac.uk/\n-\n-*Smart* - PSSMs from SMART domain alignment database, see\n-http://smart.embl-heidelberg.de/\n-\n-*Tigr* - PSSMs from TIGRFAM database of protein families, see\n-http://www.jcvi.org/cms/research/projects/tigrfams/overview/\n-\n-*Prk* - PSSms from automatically aligned stable clusters in the\n-Protein Clusters database, see\n-http://www.ncbi.nlm.nih.gov/proteinclusters?cmd=search&db=proteinclusters\n-\n-The exact list of domain databases offered will depend on how your\n-local Galaxy has been configured.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W327-31.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,340 +0,0 @@\n-<tool id="ncbi_tblastn_wrapper" name="NCBI BLAST+ tblastn" version="0.0.20">\n- <description>Search translated nucleotide database with protein query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">tblastn</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>tblastn -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-tblastn\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n--db_gencode $adv_opts.db_gencode\n-$adv_opts.filter_query\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-##Ungapped disabled for now - see comments below\n-##$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <option value="histdb">BLAST database from your history</option>\n- <option value="file">FASTA file from your history (see warning note below)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="data" '..b"mark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *translated nucleotide database* using a *protein query*,\n-using the NCBI BLAST+ tblastn command line tool.\n-\n-.. class:: warningmark\n-\n-You can also search against a FASTA file of subject nucleotide\n-sequences. This is *not* advised because it is slower (only one\n-CPU is used), but more importantly gives e-values for pairwise\n-searches (very small e-values which will look overly signficiant).\n-In most cases you should instead turn the other FASTA file into a\n-database first using *makeblastdb* and search against that.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml --- a/tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,294 +0,0 @@\n-<tool id="ncbi_tblastx_wrapper" name="NCBI BLAST+ tblastx" version="0.0.20">\n- <description>Search translated nucleotide database with translated nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject,histdb" merge_outputs="output1"></parallelism>\n- <requirements>\n- <requirement type="binary">tblastx</requirement>\n- <requirement type="package" version="2.2.26+">blast+</requirement>\n- </requirements>\n- <version_command>tblastx -version</version_command>\n- <command>\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-tblastx\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#elif $db_opts.db_opts_selector == "histdb":\n- -db "${os.path.join($db_opts.histdb.extra_files_path,\'blastdb\')}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--query_gencode $query_gencode\n--evalue $evalue_cutoff\n--out "$output1"\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n--db_gencode $adv_opts.db_gencode\n-$adv_opts.filter_query\n-$adv_opts.strand\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <stdio>\n- <!-- Anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- <!-- In case the return code has not been set propery check stderr too -->\n- <regex match="Error:" />\n- <regex match="Exception:" />\n- </stdio>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">Locally installed BLAST database</option>\n- <option value="histdb">BLAST database from your history</option>\n- <option value="file">FASTA file from your history (see warning note below)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="histdb">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="data" format="blastdbn" label="Nucleotide BLAST database" />\n- <param name="subject" type="hidden" value="" />\n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" />\n- <param name="histdb" type="hidden" value="" />\n- <param name="subject" type="data" format'..b"mark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *translated nucleotide database* using a *protein query*,\n-using the NCBI BLAST+ tblastx command line tool.\n-\n-.. class:: warningmark\n-\n-You can also search against a FASTA file of subject nucleotide\n-sequences. This is *not* advised because it is slower (only one\n-CPU is used), but more importantly gives e-values for pairwise\n-searches (very small e-values which will look overly signficiant).\n-In most cases you should instead turn the other FASTA file into a\n-database first using *makeblastdb* and search against that.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output. Galaxy now uses this extended 24 column output by default.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a ';'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus\n- </help>\n-</tool>\n" |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/repository_dependencies.xml --- a/tools/ncbi_blast_plus/repository_dependencies.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,5 +0,0 @@ -<?xml version="1.0"?> -<repositories description="This requires the BLAST datatype definitions (e.g. the BLAST XML format)."> -<!-- Revision 4:f9a7783ed7b6 on the main (and test) tool shed is v0.0.14 which added BLAST databases --> -<repository changeset_revision="f9a7783ed7b6" name="blast_datatypes" owner="devteam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> -</repositories> |
b |
diff -r c1a6e5aefee0 -r 688f3fb09a6a tools/ncbi_blast_plus/tool_dependencies.xml --- a/tools/ncbi_blast_plus/tool_dependencies.xml Wed May 29 10:03:48 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,20 +0,0 @@ -<?xml version="1.0"?> -<tool_dependency> - <package name="blast+" version="2.2.26+"> - <install version="1.0"> - <actions> - <action type="download_by_url">ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz</action> - <action type="shell_command">cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install</action> - <action type="set_environment"> - <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable> - </action> - </actions> - </install> - <readme> -Downloads and compiles BLAST+ from the NCBI, which assumes you have -all the required build dependencies installed. See: -http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download - </readme> - </package> -</tool_dependency> - |