Mercurial > repos > devteam > kraken_filter
comparison kraken-filter.xml @ 2:317726be0703 draft
planemo upload for repository https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/kraken/kraken_filter/ commit cb6ebb843c71dcfc73aa05cc616f8e3229170108-dirty
author | devteam |
---|---|
date | Wed, 15 Jul 2015 15:22:22 -0400 |
parents | f093ba52debe |
children | 7fb926851f66 |
comparison
equal
deleted
inserted
replaced
1:f093ba52debe | 2:317726be0703 |
---|---|
1 <tool id="kraken-filter" name="Filter Kraken" version="1.0.0"> | 1 <tool id="kraken-filter" name="Kraken-filter" version="1.1.0"> |
2 <description> | 2 <description> |
3 by confidence score | 3 filter classification by confidence score |
4 </description> | 4 </description> |
5 <macros> | 5 <macros> |
6 <import>macros.xml</import> | 6 <import>macros.xml</import> |
7 </macros> | 7 </macros> |
8 <command> | 8 <command> |
10 @SET_DATABASE_PATH@ && | 10 @SET_DATABASE_PATH@ && |
11 kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output" | 11 kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output" |
12 ]]> | 12 ]]> |
13 </command> | 13 </command> |
14 <inputs> | 14 <inputs> |
15 <param format="tabular" label="Kraken classified output" name="input" type="data" /> | 15 <param format="tabular" label="Kraken output" name="input" type="data" help="Select taxonomy classification produced by kraken"/> |
16 <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" /> | 16 <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" help="--threshold; A number between 0 and 1; default=0"/> |
17 <expand macro="input_database" /> | 17 <expand macro="input_database" /> |
18 </inputs> | 18 </inputs> |
19 <outputs> | 19 <outputs> |
20 <data format="tabular" name="filtered_output" /> | 20 <data format="tabular" name="filtered_output" /> |
21 </outputs> | 21 </outputs> |
22 <help> | 22 <help> |
23 <![CDATA[ | 23 <![CDATA[ |
24 | 24 |
25 ***Note that the database used must be the same as the one used to generate | 25 .. class:: warningmark |
26 the output file, or the report script may encounter problems.*** | |
27 | 26 |
28 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output given earlier: | 27 **Note**: the database used must be the same as the one used in the original Kraken run |
29 | 28 |
30 "562:13 561:4 A:31 0:1 562:3" would indicate that: | 29 ----- |
31 | 30 |
32 the first 13 k-mers mapped to taxonomy ID #562 | 31 **What it does** |
33 the next 4 k-mers mapped to taxonomy ID #561 | 32 |
34 the next 31 k-mers contained an ambiguous nucleotide | 33 At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the ``kraken-filter`` script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by ``kraken-filter``. |
35 the next k-mer was not in the database | 34 |
36 the last 3 k-mers mapped to taxonomy ID #562 | 35 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output:: |
36 | |
37 562:13 561:4 A:31 0:1 562:3 | |
38 | |
39 would indicate that:: | |
40 | |
41 the first 13 k-mers mapped to taxonomy ID #562 | |
42 the next 4 k-mers mapped to taxonomy ID #561 | |
43 the next 31 k-mers contained an ambiguous nucleotide | |
44 the next k-mer was not in the database | |
45 the last 3 k-mers mapped to taxonomy ID #562 | |
37 | 46 |
38 In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified. | 47 In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified. |
39 ]]> | 48 ]]> |
40 </help> | 49 </help> |
41 <expand macro="version_command" /> | 50 <expand macro="version_command" /> |