Mercurial > repos > jjohnson > metaphlan
changeset 0:0ec6c5781381 draft
Uploaded
author | jjohnson |
---|---|
date | Tue, 09 Oct 2012 12:13:27 -0400 |
parents | |
children | 868624c77941 |
files | metaphlan.xml metaphlan_to_phyloxml.py metaphlan_to_phyloxml.xml |
diffstat | 3 files changed, 184 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/metaphlan.xml Tue Oct 09 12:13:27 2012 -0400 @@ -0,0 +1,91 @@ +<tool id="metaphlan" name="MetaPhlAn" version="1.6.0"> +<requirements> +<requirement type="package">metaphlan</requirement> +<requirement type="package">bowtie2</requirement> +</requirements> + <description>Metagenomic Phylogenetic Analysis</description> + <command> + metaphlan.py + $input + --bowtie2db ${GALAXY_DATA_INDEX_DIR}/shared/metaphlan/bowtie2db/mpa + --no_map + -o $output + --bt2_ps $PresetsForBowtie2 + </command> + + <inputs> + <param format="fasta" name="input" type="data" label="Input metagenome (multi-fasta of metagenomic reads, loaded with the Get Data module, see below for an example)"></param> + <param name="PresetsForBowtie2" type="select" format="text"> + <label>Sensitivity options for read-marker similarity (as described by BowTie2)</label> + <option value="very-sensitive-local">Very Sensitive Local</option> + <option value="sensitive-local">Sensitive Local</option> + <option value="very-sensitive">Very Sensitive</option> + <option value="sensitive">Sensitive</option> + </param> +</inputs> +<outputs> + <data format="tabular" name="output" /> +</outputs> + +<tests> +</tests> + +<help> + +.. class:: infomark + +**Input example:** You can try out MetaPhlAn using the synthetic dataset (250,000 reads) available at: http://huttenhower.sph.harvard.edu/sites/default/files/LC1.fna . There is no need to download the file, you can just copy-and-paste the dataset address in the "Upload File" module inside the "Load Data" link here in the left panel. + +.. class:: infomark + +**Computational time:** Unless the server is overloaded, you should expect the tool to process ~10,000 reads per second. The synthetic metagenome linked above (250,000 reads) should take no more than 30 seconds to complete. + +.. class:: infomark + +**Tip:** If your input is in FASTQ you can convert it in FASTA using the corresponding Galaxy module included in the "Convert Format" tools. + +--------- + +**What it does** + +MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from reference genomes, allowing orders of magnitude speedups and unambiguous taxonomic assignments. + +Although MetaPhlAn can use both BlastN and BowTie2 in the read-to-marker mapping step, this Galaxy module uses only BowTie2 for computational reasons. + +For additional information about MetaPhlAn and the MetaPhlAn command line package, please refer to http://huttenhower.sph.harvard.edu/metaphlan or to the paper reported below. Please notice that most of the additional parameters that can be tuned with the command line version are set here to the default values. + +--------- + +**Inputs** + +The input file must be a multi-fasta file containing metagenomic reads loaded with the "Get Data" module in the left panel. Reads can be as short as ~40 nt although lengths higher than 70 nt are recommended. + +A synthetic metagenome you can use as sample input is available at http://huttenhower.sph.harvard.edu/sites/default/files/LC1.fna + +**Outputs** + +The output is a two column tab-separated plain file reporting the predicted microbial clades present in the metagenomic samples and the corresponding relative abundances. + +All taxonomic levels from domain to species will be reported and higher taxonomic levelis contain the sum of the abundances of its taxonomic leaf nodes (usually species) and, possibly, some lower level "unclassified" clades. + +----- + +**Citation and contacts** + +If you find MetaPhlAn useful in your research, please cite our paper: + +| `Nicola Segata`_, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson, `Curtis Huttenhower`_. +| **"Fast and accurate metagenomic profiling of microbial community composition using unique clade-specific marker genes"** +| Nature Methods, 2012 (in press) + +.. _Nicola Segata: nsegata@hsph.harvard.edu +.. _Curtis Huttenhower: chuttenh@hsph.harvard.edu + +If you have any questions or comments, feel free to `contact us`_. Additional information are available at http://huttenhower.sph.harvard.edu/metaphlan and in the FAQ at the same page. You can also join and use our user group at https://groups.google.com/d/forum/metaphlan-users + +.. _contact us: nsegata@hsph.harvard.edu + + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/metaphlan_to_phyloxml.py Tue Oct 09 12:13:27 2012 -0400 @@ -0,0 +1,76 @@ +#!/usr/bin/env python + +""" +Read metaphaln output summarizing taxonomic distribution and format in PhyloXML format + +usage: %prog metaphlan.txt phylo.xml +""" + +import sys + +# Metaphlan output looks like: +# k__Bacteria 99.07618 +# k__Archaea 0.92382 +# k__Bacteria|p__Proteobacteria 82.50732 +# k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria 81.64905 + +rank_map = { 'k__': 'kingdom', 'p__': 'phylum', 'c__': 'class', 'o__': 'order', 'f__': 'family', 'g__': 'genus', 's__': 'species' } + +class Node( object ): + """Node in a taxonomy""" + def __init__( self, rank=None, name=None ): + self.rank = rank + self.name = name + self.value = None + self.children = dict() + @staticmethod + def from_metaphlan_file( file ): + """ + Build tree from metaphlan output + """ + root = Node() + for line in file: + taxa, abundance = line.split() + parts = taxa.split( "|" ) + root.add( parts, abundance ) + return root + def add( self, parts, value ): + """ + Parts is a list of node names, recursively add nodes until we reach + the last part, and then attach the value to that node. + """ + if len( parts ) == 0: + self.value = value + else: + next_part = parts.pop(0) + rank = rank_map[ next_part[:3] ] + name = next_part[3:] + if name not in self.children: + self.children[name] = Node( rank, name ) + self.children[name].add( parts, value ) + def __str__( self ): + if self.children: + return "(" + ",".join( str( child ) for child in self.children.itervalues() ) + "):" + self.name + else: + return self.name + def to_phyloxml( self, out ): + print >>out, "<clade>" + if self.name: + print >>out, "<name>%s</name>" % self.name + print >>out, "<taxonomy><scientific_name>%s</scientific_name><rank>%s</rank></taxonomy>" % ( self.name, self.rank ) + if self.value: + print >>out, "<property datatype='xsd:float' ref='metaphlan:abundance' applies_to='node'>%s</property>" % self.value + ## print >>out, "<confidence type='abundance'>%s</confidence>" % self.value + for child in self.children.itervalues(): + child.to_phyloxml( out ) + print >>out, "</clade>" + +out = open( sys.argv[2], 'w' ) + +print >>out, '<phyloxml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.phyloxml.org" xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd">' +print >>out, '<phylogeny rooted="true">' + +Node.from_metaphlan_file( open( sys.argv[1] ) ).to_phyloxml( out ) + +print >>out, '</phylogeny>' +print >>out, '</phyloxml>'
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/metaphlan_to_phyloxml.xml Tue Oct 09 12:13:27 2012 -0400 @@ -0,0 +1,17 @@ +<tool id="meta_to_phylo" name="MetaPhlAn to PhyloXML" version="1.0.0"> + <description>Converter</description> + <command interpreter="python"> +metaphlan_to_phyloxml.py $input $output + </command> + <inputs> + <param name="input" type="data" format="tabular" label="Input MetaPhlAn File"/> + </inputs> + <outputs> + <data format="xml" name="output" label="${tool.name} on ${on_string}" /> + </outputs> + <tests> + </tests> + <help> + MetaPhlAn to PhyloXML Converter + </help> +</tool>