comparison kraken2tax.xml @ 0:5f655b0279cc draft

planemo upload for repository https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/taxonomy/kraken2tax/ commit 5a4e0ca9992af3a6e5ed2b533f04bb82ce761e0b
author devteam
date Mon, 09 Nov 2015 12:27:14 -0500
parents
children 65068f909cc7
comparison
equal deleted inserted replaced
-1:000000000000 0:5f655b0279cc
1 <tool id="Kraken2Tax" name="Convert Kraken" version="1.1">
2 <description>data to Galaxy taxonomy representation</description>
3 <requirements>
4 <requirement type="package" version="4.1.0">gnu_awk</requirement>
5 <requirement type="package" version="8d245994d7">gb_taxonomy</requirement>
6 </requirements>
7 <command>
8 <![CDATA[
9 awk '{ print \$${read_name}, \$${tax_id} }' OFS="\t" "${input}" | taxonomy-reader "${ncbi_taxonomy.fields.path}/names.dmp" "${ncbi_taxonomy.fields.path}/nodes.dmp" 1 > "${out_file}"
10 ]]>
11 </command>
12 <inputs>
13 <param format="tabular" name="input" type="data" label="Choose dataset to convert"/>
14 <param label="Select a taxonomy database" name="ncbi_taxonomy" type="select">
15 <options from_data_table="ncbi_taxonomy">
16 <validator message="No built-in databases are available" type="no_options" />
17 </options>
18 </param>
19 <param name="read_name" label="Read name" type="data_column" data_ref="input" value="2" help="Select column containing read names"/>
20 <param name="tax_id" label="Taxonomy ID field" type="data_column" data_ref="input" numerical="True" value="3" help="Select column containing taxonomy ID"/>
21 </inputs>
22 <outputs>
23 <data format="taxonomy" name="out_file" />
24 </outputs>
25 <tests>
26 <test>
27 <param name="input" ftype="tabular" value="kraken2tax.txt"/>
28 <param name="read_name" value="2"/>
29 <param name="tax_id" value="3"/>
30 <output name="out_file" file="kraken2tax-test1.txt"/>
31 </test>
32 </tests>
33 <help>
34
35 .. class:: infomark
36
37 Use *Filter and Sort->Filter* to restrict output of this tool to desired taxonomic ranks. You can also use *Text Manipulation->Cut* to remove unwanted columns from the output.
38
39 ------
40
41 **What it does**
42
43 This tool is designed to translate results of the Kraken metagenomic classifier (see citations below) to the full representation of NCBI taxonomy. It does so by using Taxonomic ID field provided by Kraken. The output of this tool can be directly visualized by the Krona tool. It is based on `gb_taxonomy_tools` developed by https://github.com/spond.
44
45 -------
46
47 **Example**
48
49 Suppose you have Kraken output that looks like this (here the second field is the name of a sequencing read and the third is the taxonomic ID)::
50
51 C Read_1 9606 465 Q:1
52
53 and you want to obtain the full taxonomic representation for this read. Setting **Read name** and **Taxonomy ID field** parameters to **2** and **3**, respectively, will produce the following output (you may need to scroll sideways to see the entire line)::
54
55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
56 Read_1 9606 root Eukaryota Metazoa n n Chordata Craniata Gnathostomata Mammalia n Euarchontoglires Primates Haplorrhini Hominoidea Hominidae n n n Homo n Homo sapiens n
57
58 In other words the tool printed *Read name*, *Taxonomy ID field*, and appended 22 columns containing taxonomic ranks from Superkingdom to Subspecies. Below is a formal definition of the output columns::
59
60 Column Definition
61 ------- -------------------------------------------------
62 1 Name (specified by 'Read name' dropdown)
63 2 taxID (specified by 'Taxonomy ID field' dropdown)
64 3 root
65 4 superkingdom
66 5 kingdom
67 6 subkingdom
68 7 superphylum
69 8 phylum
70 9 subphylum
71 10 superclass
72 11 class
73 12 subclass
74 13 superorder
75 14 order
76 15 suborder
77 16 superfamily
78 17 family
79 18 subfamily
80 19 tribe
81 20 subtribe
82 21 genus
83 22 subgenus
84 23 species
85 24 subspecies
86
87 ------
88
89 .. class:: warningmark
90
91 **Why do I have these "n" things?**
92
93 Be aware that the NCBI taxonomy (ftp://ftp.ncbi.nih.gov/pub/taxonomy/) this tool relies upon is incomplete. This means that for many species one or more ranks are absent and represented as "**n**". In the above example *subkingdom*, *superphylum* etc. are missing.
94
95
96 </help>
97 <citations>
98 <citation type="doi">10.1186/gb-2014-15-3-r46</citation>
99 <citation type="doi"> 10.1101/gr.094508.109</citation>
100 </citations>
101 </tool>
102
103