Mercurial > repos > bgruening > antismash

<tool id="antismash" name="Secondary Metabolites" version="2.0.2.0">
    <description>and Antibiotics Analysis (antiSMASH)</description>
    <requirements>
        <requirement type="package" version="3.0">hmmer</requirement>
        <requirement type="package" version="2.3.2">hmmer</requirement>
        <requirement type="package" version="2.2.28">blast+</requirement>
        <requirement type="package" version="3.8.31">muscle</requirement>
        <requirement type="package" version="2.0.2">antismash_python_deps</requirement>
        <requirement type="package" version="2.0.2">antismash</requirement>
    </requirements>
    <command>
        run_antismash.py

            --input $infile
            --cpus 4
            #set $type_list = ','.join([$type for $type in $types])
            --enable $type_list
            --input-type nucl
            $smcogs
            $clusterblast
            $subclusterblast
            $inclusive
            $full_hmmer
            $full_blast

            --pfamdir  $pfam_database.fields.path

            ## leave out the start and end features, it can be easily replaced with Galaxy tools
            ##--from START          Start analysis at nucleotide specified
            ##--to END

    </command>
    <inputs>
        <param name="infile" type="data" format="gb,embl" label="Nucleotide sequence file in GenBank or EMBL format"/>

        <param name="clusterblast" type="boolean" label="BLAST identified clusters against known clusters" truevalue="--clusterblast" falsevalue="" checked="True" />
        <param name="smcogs" type="boolean" label="analysis of secondary metabolism gene families (smCOGs)"
            falsevalue="" truevalue="--smcogs" checked="True" />

        <param name="full_blast" type="boolean" label="Run a whole-genome BLAST analysis" truevalue="--full-blast" falsevalue="" checked="False" />
        <param name="subclusterblast" type="boolean" label="Subcluster Blast analysis" truevalue="--subclusterblast" falsevalue="" checked="false" />
        <param name="full_hmmer" type="boolean" label="Run a whole-genome Pfam analysis" truevalue="--full-hmmer" falsevalue="" checked="false" />

        <param name="inclusive" type="boolean" label="Use inclusive algorithm for cluster detection" truevalue="--inclusive" falsevalue="" checked="false" />

        <param name="pfam_database" type="select" label="Pfam database" help="Pfam Covariance models">
            <options from_file="antismash.loc">
              <column name="value" index="0"/>
              <column name="name" index="1"/>
              <column name="path" index="2"/>
            </options>
        </param>

        <param name="types" type="select" display="checkboxes" multiple="true" label="Gene cluster types to search">
            <option value="t1pks" selected="True">type I polyketide synthases</option>
            <option value="t2pks" selected="True">type II polyketide synthases</option>
            <option value="t3pks" selected="True">type III polyketide synthases</option>
            <option value="t4pks" selected="True">type IV polyketide synthases</option>
            <option value="transatpks" selected="True">trans-AT PKS</option>
            <option value="nrps" selected="True">nonribosomal peptide synthetases</option>
            <option value="terpene" selected="True">terpene synthases</option>
            <option value="lantipeptide" selected="True">lantipeptides</option>
            <option value="bacteriocin" selected="True">bacteriocins</option>
            <option value="blactam" selected="True">beta-lactams</option>
            <option value="amglyccycl" selected="True">aminoglycosides / aminocyclitols</option>
            <option value="aminocoumarin" selected="True">aminocoumarins</option>
            <option value="siderophore" selected="True">siderophores</option>
            <option value="ectoine" selected="True">ectoines</option>
            <option value="butyrolactone" selected="True">butyrolactones</option>
            <option value="indole" selected="True">indoles</option>
            <option value="nucleoside" selected="True">nucleosides</option>
            <option value="phosphoglycolipid" selected="True">phosphoglycolipids</option>
            <option value="oligosaccharide" selected="True">oligosaccharides</option>
            <option value="furan" selected="True">furans</option>
            <option value="hserlactone" selected="True">hserlactones</option>
            <option value="thiopeptide" selected="True">thiopeptides</option>
            <option value="phenazine" selected="True">phenazines</option>
            <option value="phosphonate" selected="True">phosphonates</option>
            <option value="others" selected="True">others</option>
        </param>

    </inputs>
    <outputs>
        <data format="fasta" name="geneclusterprots" label="${tool.name} on ${on_string} (Gen Cluster Proteins)" />
        <data format="tabular" name="zip" label="${tool.name} on ${on_string} (all files compressed)" />
        <data format="html" name="html" label="${tool.name} on ${on_string} (html report)" />
        <data name="embl" format="text" label="${tool.name} on ${on_string} EMBL Output Format">
          <filter>(wg_blast == True or pfam == True)</filter>
        </data>
    </outputs>
  <help>

.. class:: infomark

**What it does**

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.
It integrates and cross-links with a large number of in silico secondary metabolite analysis tools that have been published earlier.


**Input**

The ideal input for antiSMASH is an annotated nucleotide file in Genbank format or EMBL format. If no annotation is available,
we recommend running your sequence through an annotation pipeline like RAST are one included in Galaxy.


There are several optional analyses that may or may not be run on your sequence.
Highly recommended is the Gene Cluster Blast Comparative Analysis, which runs BlastP using each amino acid sequence from a detected gene cluster as a
query on a large database of predicted protein sequences from secondary metabolite biosynthetic gene clusters, and pools the results to identify
the gene clusters that are most homologous to the gene cluster that was detected in your query nucleotide sequence.


Also available is the analysis of secondary metabolism gene families (smCOGs).
This analysis attempts to allocate each gene in the detected gene clusters to a secondary metabolism-specific gene
family using profile hidden Markov models specific for the conserved sequence region characteristic of this family.
Additionally, a phylogenetic tree is constructed of each gene together with the (max. 100) sequences of the smCOG seed alignment.


For the most thorough genome analysis, we provide genome-wide PFAM HMM analysis of all genes in the genome through modules of the CLUSEAN pipeline.
Of course, some regions important to secondary metabolism may have been missed in the gene cluster identification stage
(e.g. because they represent the biosynthetic pathway of a yet unknown secondary metabolite).
Therefore, when genome-wide PFAM HMM analysis is selected, the PFAM frequencies are also used to find all genome regions in which PFAM domains typical for secondary metabolism are overrepresented.


**References**

Marnix H. Medema, Kai Blin, Peter Cimermancic, Victor de Jager, Piotr Zakrzewski, Michael A. Fischbach, Tilmann Weber,
Rainer Breitling and Eriko Takano (2011). antiSMASH: Rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. Nucleic Acids Research, doi: 10.1093/nar/gkr466.

http://antismash.secondarymetabolites.org/help.html

  </help>
</tool>
author	bgruening
date	Wed, 09 Oct 2013 11:14:23 -0400
parents	b11e1dfbc7c9
children	9cfa2fb488b0