comparison kraken-filter.xml @ 6:ccfb9cbfcc72 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tool_collections/kraken/kraken_filter/ commit e8fc7c9dad5f583ad6763ecb9bd8c924832abacd
author iuc
date Mon, 07 Aug 2017 17:27:49 -0400
parents d246279116a4
children 6e690205b306
comparison
equal deleted inserted replaced
5:d246279116a4 6:ccfb9cbfcc72
1 <tool id="kraken-filter" name="Kraken-filter" version="1.2.1"> 1 <tool id="kraken-filter" name="Kraken-filter" version="@WRAPPER_VERSION@">
2 <description> 2 <description>filter classification by confidence score</description>
3 filter classification by confidence score
4 </description>
5 <macros> 3 <macros>
6 <import>macros.xml</import> 4 <import>macros.xml</import>
7 </macros> 5 </macros>
8 <expand macro="requirements" /> 6 <expand macro="requirements" />
9 <expand macro="stdio" />
10 <expand macro="version_command" /> 7 <expand macro="version_command" />
11 <command> 8 <command detect_errors="exit_code"><![CDATA[
12 <![CDATA[
13 @SET_DATABASE_PATH@ && 9 @SET_DATABASE_PATH@ &&
14 kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output" 10
15 ]]> 11 kraken-filter
16 </command> 12 @INPUT_DATABASE@
13 --threshold $threshold
14 '${input}'
15 > '$filtered_output'
16 ]]></command>
17 <inputs> 17 <inputs>
18 <param format="tabular" label="Kraken output" name="input" type="data" help="Select taxonomy classification produced by kraken"/> 18 <param name="input" type="data" format="tabular" label="Kraken output" help="Select taxonomy classification produced by kraken"/>
19 <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" help="--threshold; A number between 0 and 1; default=0"/> 19 <param argument="--threshold" type="float" value="0" min="0" max="1"
20 label="Confidence threshold" help="A floating point number between 0 and 1; default=0"/>
21
20 <expand macro="input_database" /> 22 <expand macro="input_database" />
21 </inputs> 23 </inputs>
22 <outputs> 24 <outputs>
23 <data format="tabular" name="filtered_output" /> 25 <data format="tabular" name="filtered_output" />
24 </outputs> 26 </outputs>
25 <tests> 27 <tests>
26 <test> 28 <test>
27 <param name="input" value="kraken_filter_test1.tab"/> 29 <param name="input" value="kraken_filter_test1.tab"/>
28 <param name="threshold" value="0"/> 30 <param name="threshold" value="0"/>
29 <param name="kraken_database" value="test_db"/> 31 <param name="kraken_database" value="test_db"/>
30 <output name="output" file="kraken_filter_test1_output.tab" ftype="tabular"/> 32
33 <output name="filtered_output" file="kraken_filter_test1_output.tab" ftype="tabular"/>
31 </test> 34 </test>
32 </tests> 35 </tests>
33 36
34 <help> 37 <help>
35 <![CDATA[ 38 <![CDATA[
44 47
45 At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the ``kraken-filter`` script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by ``kraken-filter``. 48 At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the ``kraken-filter`` script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by ``kraken-filter``.
46 49
47 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output:: 50 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output::
48 51
49 562:13 561:4 A:31 0:1 562:3 52 562:13 561:4 A:31 0:1 562:3
50 53
51 would indicate that:: 54 would indicate that::
52 55
53 the first 13 k-mers mapped to taxonomy ID #562 56 the first 13 k-mers mapped to taxonomy ID #562
54 the next 4 k-mers mapped to taxonomy ID #561 57 the next 4 k-mers mapped to taxonomy ID #561