Mercurial > repos > peterjc > blast_rbh
annotate tools/blast_rbh/blast_rbh.xml @ 20:e8f8c580bcca draft
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
author | peterjc |
---|---|
date | Fri, 15 May 2015 05:47:40 -0400 |
parents | b6531758b428 |
children | b41f8c43705e |
rev | line source |
---|---|
20
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
1 <tool id="blast_reciprocal_best_hits" name="BLAST Reciprocal Best Hits (RBH)" version="0.1.7"> |
0 | 2 <description>from two FASTA files</description> |
3 <requirements> | |
12 | 4 <requirement type="package" version="1.64">biopython</requirement> |
5 <requirement type="python-module">Bio</requirement> | |
6 <requirement type="binary">makeblastdb</requirement> | |
7 <requirement type="binary">blastp</requirement> | |
8 <requirement type="binary">blastn</requirement> | |
13 | 9 <requirement type="package" version="2.2.30">blast+</requirement> |
0 | 10 </requirements> |
20
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
11 <stdio> |
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
12 <!-- Anything other than zero is an error --> |
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
13 <exit_code range="1:" /> |
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
14 <exit_code range=":-1" /> |
e8f8c580bcca
planemo upload for repository https://github.com/peterjc/galaxy_blast/tools/blast_rbh commit 05c2a834609f1fd02372c4f9b0a0733680fe9675-dirty
peterjc
parents:
19
diff
changeset
|
15 </stdio> |
0 | 16 <version_command interpreter="python"> |
17 blast_rbh.py --version | |
18 </version_command> | |
19 <command interpreter="python"> | |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
20 blast_rbh.py "$fasta_a" "$fasta_b" |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
21 -a $seq.dbtype |
0 | 22 #if $seq.dbtype=="nucl" |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
23 -t $seq.nucl_type |
0 | 24 #else |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
25 -t $seq.prot_type |
0 | 26 #end if |
12 | 27 $make_nr |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
28 -i $identity |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
29 -c $q_cover |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
30 -o "$output" |
0 | 31 </command> |
32 <inputs> | |
33 <!-- Galaxy does not have sub-types for protein vs nucletide FASTA --> | |
34 <param name="fasta_a" type="data" format="fasta" | |
35 label="Genes/proteins from species A" | |
36 description="FASTA file, one sequence per gene/protein." /> | |
37 <param name="fasta_b" type="data" format="fasta" | |
38 label="Genes/proteins from species B" | |
39 description="FASTA file, one sequence per gene/protein." /> | |
40 <conditional name="seq"> | |
41 <param name="dbtype" type="select" label="Molecule type of FASTA inputs"> | |
42 <option value="prot">protein</option> | |
43 <option value="nucl">nucleotide</option> | |
44 </param> | |
45 <when value="prot"> | |
46 <param name="prot_type" type="select" display="radio" label="Type of BLAST"> | |
47 <option value="blastp">blastp - Traditional BLASTP to compare a protein query to a protein database</option> | |
14
40c85a67e645
Uploaded v0.1.6, offer the new blastp-fast tasked added in BLAST+ 2.2.30.
peterjc
parents:
13
diff
changeset
|
48 <option value="blastp-fast">blastp-fast - Uses longer words as described by Shiryev et al (2007)</option> |
0 | 49 <option value="blastp-short">blastp-short - BLASTP optimized for queries shorter than 30 residues</option> |
50 </param> | |
51 </when> | |
52 <when value="nucl"> | |
53 <param name="nucl_type" type="select" display="radio" label="Type of BLAST"> | |
54 <option value="megablast">megablast - Traditional megablast used to find very similar (e.g., intraspecies or closely related species) sequences</option> | |
55 <option value="blastn">blastn - Traditional BLASTN requiring an exact match of 11, for somewhat similar sequences</option> | |
56 <option value="blastn-short">blastn-short - BLASTN program optimized for sequences shorter than 50 bases</option> | |
57 <option value="dc-megablast">dc-megablast - Discontiguous megablast used to find more distant (e.g., interspecies) sequences</option> | |
4
57245c11b8cb
Uploaded v0.1.0d, TBLASTX support; changed output columns
peterjc
parents:
2
diff
changeset
|
58 <option value="tblastx">tblastx - TBLASTX program using translated query against translated database (protein level matches)</option> |
0 | 59 </param> |
60 </when> | |
61 </conditional> | |
62 <param name="identity" type="float" value="70" min="0" max="100" | |
63 label="Minimum percentage identity for BLAST matches" | |
64 help="Default is 70%, use 0 for no filtering." /> | |
65 <param name="q_cover" type="float" value="50" min="0" max="100" | |
66 label="Minimum percentage query coverage for BLAST matches" | |
67 help="Default is 50%, use 0 for no filtering." /> | |
12 | 68 <param name="make_nr" type="boolean" checked="false" truevalue="--nr" falsevalue="" |
69 label="Process input FASTA files to collapse identical sequences" | |
70 help="i.e. First make the input non-redundant" /> | |
0 | 71 </inputs> |
72 <outputs> | |
73 <data name="output" format="tabular" label="BLAST RBH: $fasta_a.name vs $fasta_b.name" /> | |
74 </outputs> | |
75 <tests> | |
76 <test> | |
1 | 77 <param name="fasta_a" value="four_human_proteins.fasta" ftype="fasta"/> |
78 <param name="fasta_b" value="rhodopsin_proteins.fasta" ftype="fasta"/> | |
79 <param name="dbtype" value="prot"/> | |
80 <param name="nucl_type" value="blastp"/> | |
81 <param name="identity" value="0.0"/> | |
82 <param name="q_cover" value="0.0"/> | |
83 <output name="output" file="rbh_blastp_four_human_vs_rhodopsin_proteins.tabular" ftype="tabular"/> | |
84 </test> | |
85 <test> | |
0 | 86 <param name="fasta_a" value="rhodopsin_nucs.fasta" ftype="fasta"/> |
87 <param name="fasta_b" value="three_human_mRNA.fasta" ftype="fasta"/> | |
88 <param name="dbtype" value="nucl"/> | |
89 <param name="nucl_type" value="megablast"/> | |
90 <param name="identity" value="0.0"/> | |
91 <param name="q_cover" value="0.0"/> | |
92 <output name="output" file="rbh_megablast_rhodopsin_nucs_vs_three_human_mRNA.tabular" ftype="tabular"/> | |
93 </test> | |
94 <test> | |
95 <param name="fasta_a" value="rhodopsin_nucs.fasta" ftype="fasta"/> | |
96 <param name="fasta_b" value="three_human_mRNA.fasta" ftype="fasta"/> | |
97 <param name="dbtype" value="nucl"/> | |
98 <param name="nucl_type" value="megablast"/> | |
99 <param name="identity" value="92"/> | |
100 <param name="q_cover" value="86"/> | |
101 <output name="output" file="rbh_megablast_rhodopsin_nucs_vs_three_human_mRNA.tabular" ftype="tabular"/> | |
102 </test> | |
103 <!-- push the percentage identity over the 92.07% level --> | |
104 <test> | |
105 <param name="fasta_a" value="rhodopsin_nucs.fasta" ftype="fasta"/> | |
106 <param name="fasta_b" value="three_human_mRNA.fasta" ftype="fasta"/> | |
107 <param name="dbtype" value="nucl"/> | |
108 <param name="nucl_type" value="megablast"/> | |
109 <param name="identity" value="92.5"/> | |
110 <param name="q_cover" value="86"/> | |
111 <output name="output" file="rbh_none.tabular" ftype="tabular"/> | |
112 </test> | |
113 <!-- push the coverage over the 86% level --> | |
114 <test> | |
115 <param name="fasta_a" value="rhodopsin_nucs.fasta" ftype="fasta"/> | |
116 <param name="fasta_b" value="three_human_mRNA.fasta" ftype="fasta"/> | |
117 <param name="dbtype" value="nucl"/> | |
118 <param name="nucl_type" value="megablast"/> | |
119 <param name="identity" value="92"/> | |
120 <param name="q_cover" value="87"/> | |
121 <output name="output" file="rbh_none.tabular" ftype="tabular"/> | |
122 </test> | |
123 <test> | |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
124 <param name="fasta_a" value="rhodopsin_nucs.fasta" ftype="fasta"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
125 <param name="fasta_b" value="three_human_mRNA.fasta" ftype="fasta"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
126 <param name="dbtype" value="nucl"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
127 <param name="nucl_type" value="tblastx"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
128 <param name="identity" value="0.0"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
129 <param name="q_cover" value="0.0"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
130 <output name="output" file="rbh_tblastx_rhodopsin_nucs_vs_three_human_mRNA.tabular" ftype="tabular"/> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
131 </test> |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
132 <test> |
0 | 133 <param name="fasta_a" value="three_human_mRNA.fasta" ftype="fasta"/> |
134 <param name="fasta_b" value="rhodopsin_nucs.fasta" ftype="fasta"/> | |
135 <param name="dbtype" value="nucl"/> | |
136 <param name="nucl_type" value="blastn"/> | |
137 <param name="identity" value="0.0"/> | |
138 <param name="q_cover" value="0.0"/> | |
139 <output name="output" file="rbh_blastn_three_human_mRNA_vs_rhodopsin_nucs.tabular" ftype="tabular"/> | |
140 </test> | |
6 | 141 <!-- this pair of examples test tied best hits --> |
142 <test> | |
143 <param name="fasta_a" value="k12_ten_proteins.fasta" ftype="fasta"/> | |
144 <param name="fasta_b" value="k12_edited_proteins.fasta" ftype="fasta"/> | |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
145 <param name="dbtype" value="prot"/> |
6 | 146 <param name="nucl_type" value="blastp"/> |
147 <param name="identity" value="0.0"/> | |
148 <param name="q_cover" value="0.0"/> | |
149 <output name="output" file="rbh_blastp_k12.tabular" ftype="tabular"/> | |
150 </test> | |
151 <test> | |
152 <param name="fasta_a" value="k12_edited_proteins.fasta" ftype="fasta"/> | |
153 <param name="fasta_b" value="k12_ten_proteins.fasta" ftype="fasta"/> | |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
154 <param name="dbtype" value="prot"/> |
6 | 155 <param name="nucl_type" value="blastp"/> |
156 <param name="identity" value="0.0"/> | |
157 <param name="q_cover" value="0.0"/> | |
158 <output name="output" file="rbh_blastp_k12.tabular" ftype="tabular"/> | |
159 </test> | |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
160 <!-- this tests self-comparison --> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
161 <test> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
162 <param name="fasta_a" value="k12_edited_proteins.fasta" ftype="fasta"/> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
163 <param name="fasta_b" value="k12_edited_proteins.fasta" ftype="fasta"/> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
164 <param name="dbtype" value="prot"/> |
14
40c85a67e645
Uploaded v0.1.6, offer the new blastp-fast tasked added in BLAST+ 2.2.30.
peterjc
parents:
13
diff
changeset
|
165 <param name="nucl_type" value="blastp-fast"/> |
8
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
166 <param name="identity" value="80.0"/> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
167 <param name="q_cover" value="80.0"/> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
168 <output name="output" file="rbh_blastp_k12_self.tabular" ftype="tabular"/> |
d3eb5cda7270
Uploaded v0.1.2, self comparison and easier command line API
peterjc
parents:
7
diff
changeset
|
169 </test> |
0 | 170 </tests> |
171 <help> | |
172 **What it does** | |
173 | |
2 | 174 Takes two FASTA files (*species A* and *species B*), builds a BLAST database |
0 | 175 for each, runs reciprocal BLAST searchs (*A vs B*, and *B vs A*), optionally |
2 | 176 filters the HSPs, and then compiles a list of the reciprocal best hits (RBH). |
0 | 177 |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
178 The output from this tool is a tabular file containing multiple columns, with |
2 | 179 information about the BLAST matches used: |
0 | 180 |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
181 ====== ================================== |
0 | 182 Column Description |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
183 ------ ---------------------------------- |
2 | 184 1 ID from *species A* |
185 2 ID from *species B* | |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
186 3 Length of sequence *A* |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
187 4 Length of sequence *B* |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
188 5 Percentage of sequence *A* covered |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
189 6 Percentage of sequence *B* covered |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
190 7 HSP alignment length |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
191 8 HSP percentage identity |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
192 9 HSP bitscore |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
193 ====== ================================== |
2 | 194 |
5
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
195 These values correspond to the ``qseqid``/``sseqid``, ``qlen``/``slen``, |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
196 ``qcovhsp``, ``length``, ``pident`` and ``bitscore`` values in the BLAST+ |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
197 tabular output. |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
198 |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
199 For the alignment length, bitscore and percentage identity the values for |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
200 *A vs B* and *B vs A* are typically the same, so their minimum is shown. |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
201 The coverage values are given by the HSP alignment length divided by the |
c84b6c21e3d4
Uploaded v0.1.0e, test TBLASTX mode; more columns in output
peterjc
parents:
4
diff
changeset
|
202 sequence length (adjusted by a factor of three for TBLASTX). |
0 | 203 |
6 | 204 Note that if a sequence has equally scoring top BLAST matches to multiple |
205 sequence in the other file, it will not be considered for an RBH. This | |
206 can happen following gene duplication, or for (near) identical gene | |
207 duplicates. | |
208 | |
12 | 209 The tool can optionally make the FASTA files non-redundant by replacing |
210 repeated identical sequences with a single representative before building | |
211 the databases and running BLAST. | |
212 | |
213 Finally, the tool can be run using the same FASTA input file to look for | |
214 RBH within the dataset. In this case, self matches are discarded. | |
215 | |
0 | 216 .. class:: warningmark |
217 | |
218 **Note** | |
219 | |
220 If you are trying to use BLAST RBH matches to identify candidate orthologues | |
221 or transfer annotation, you *must* use a percentage identity and minimum | |
222 coverage threshold or similiar. See: | |
223 | |
224 Punta and Ofran (2008) The Rough Guide to In Silico Function Prediction, | |
225 or How To Use Sequence and Structure Information To Predict Protein | |
226 Function. PLoS Comput Biol 4(10): e1000160. | |
227 http://dx.doi.org/10.1371/journal.pcbi.1000160 | |
228 | |
229 The defaults are to require 70% sequence identity over the aligned region | |
230 (using ``pident`` in the BLAST+ tabular output), and that the HSP alignment | |
231 covers at least 50% of the query sequence (using ``qcovhsp`` in the BLAST+ | |
232 tabular output). | |
233 | |
234 | |
235 **References** | |
236 | |
19
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
237 Please cite: |
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
238 |
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
239 P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo (2015). |
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
240 NCBI BLAST+ integrated into Galaxy. |
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
241 bioRxiv prepint. |
b6531758b428
Uploaded v0.1.6d, include preprint citation in tool help
peterjc
parents:
18
diff
changeset
|
242 http://dx.doi.org/10.1101/014043 |
0 | 243 |
244 Christiam Camacho et al. (2009). | |
245 BLAST+: architecture and applications. | |
246 BMC Bioinformatics. 15;10:421. | |
247 http://dx.doi.org/10.1186/1471-2105-10-421 | |
248 | |
249 This wrapper is available to install into other Galaxy Instances via the Galaxy | |
250 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/blast_rbh | |
251 </help> | |
10 | 252 <citations> |
253 <citation type="doi">10.1186/1471-2105-10-421</citation> | |
18
f3ec931988fd
Uploaded 0.1.6c with the preprint citation (forgot to update source folder before making tar ball).
peterjc
parents:
17
diff
changeset
|
254 <citation type="doi">10.1101/014043</citation> |
f3ec931988fd
Uploaded 0.1.6c with the preprint citation (forgot to update source folder before making tar ball).
peterjc
parents:
17
diff
changeset
|
255 <!-- TODO - Update once "NCBI BLAST+ integrated into Galaxy" formally published --> |
10 | 256 </citations> |
0 | 257 </tool> |