Mercurial > repos > devteam > kraken_filter
diff kraken-filter.xml @ 0:60d9479c58d6 draft
Uploaded
author | devteam |
---|---|
date | Wed, 22 Apr 2015 13:03:54 -0400 |
parents | |
children | f093ba52debe |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/kraken-filter.xml Wed Apr 22 13:03:54 2015 -0400 @@ -0,0 +1,44 @@ +<tool id="kraken-filter" name="Filter Kraken" version="1.0.0"> + <description> + by confidence score + </description> + <macros> + <import>macros.xml</import> + </macros> + <command> + <![CDATA[ + kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output" + ]]> + </command> + <inputs> + <param format="tabular" label="Kraken classified output" name="input" type="data" /> + <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" /> + <expand macro="input_database" /> + </inputs> + <outputs> + <data format="tabular" name="filtered_output" /> + </outputs> + <help> +<![CDATA[ + +***Note that the database used must be the same as the one used to generate +the output file, or the report script may encounter problems.*** + +A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output given earlier: + +"562:13 561:4 A:31 0:1 562:3" would indicate that: + + the first 13 k-mers mapped to taxonomy ID #562 + the next 4 k-mers mapped to taxonomy ID #561 + the next 31 k-mers contained an ambiguous nucleotide + the next k-mer was not in the database + the last 3 k-mers mapped to taxonomy ID #562 + +In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified. + ]]> + </help> + <expand macro="version_command" /> + <expand macro="requirements" /> + <expand macro="stdio" /> + <expand macro="citations" /> +</tool>