0
|
1 <tool id="kraken-filter" name="Filter Kraken" version="1.0.0">
|
|
2 <description>
|
|
3 by confidence score
|
|
4 </description>
|
|
5 <macros>
|
|
6 <import>macros.xml</import>
|
|
7 </macros>
|
|
8 <command>
|
|
9 <![CDATA[
|
|
10 kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output"
|
|
11 ]]>
|
|
12 </command>
|
|
13 <inputs>
|
|
14 <param format="tabular" label="Kraken classified output" name="input" type="data" />
|
|
15 <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" />
|
|
16 <expand macro="input_database" />
|
|
17 </inputs>
|
|
18 <outputs>
|
|
19 <data format="tabular" name="filtered_output" />
|
|
20 </outputs>
|
|
21 <help>
|
|
22 <![CDATA[
|
|
23
|
|
24 ***Note that the database used must be the same as the one used to generate
|
|
25 the output file, or the report script may encounter problems.***
|
|
26
|
|
27 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output given earlier:
|
|
28
|
|
29 "562:13 561:4 A:31 0:1 562:3" would indicate that:
|
|
30
|
|
31 the first 13 k-mers mapped to taxonomy ID #562
|
|
32 the next 4 k-mers mapped to taxonomy ID #561
|
|
33 the next 31 k-mers contained an ambiguous nucleotide
|
|
34 the next k-mer was not in the database
|
|
35 the last 3 k-mers mapped to taxonomy ID #562
|
|
36
|
|
37 In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified.
|
|
38 ]]>
|
|
39 </help>
|
|
40 <expand macro="version_command" />
|
|
41 <expand macro="requirements" />
|
|
42 <expand macro="stdio" />
|
|
43 <expand macro="citations" />
|
|
44 </tool>
|