view additional/glimmer3-long-orfs-wrapper.xml @ 6:a07c49839f31

Uploaded
author bgruening
date Sun, 09 Jun 2013 07:57:22 -0400
parents
children
line wrap: on
line source

<tool id="glimmer_long-orfs" name="long ORFs" version="0.1">
    <description>identify long, non-overlapping ORFs (glimmer)</description>
    <requirements>
        <requirement type="package" version="3.02b">glimmer</requirement>
    </requirements>
    <command>
    long-orfs
        -n -t
        $cutoff
        $inputfile
        $output
        2>&#38;1
    </command>
    <inputs>
        <param name="inputfile" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/>
        <param name='cutoff' type='float' label='cutoff' value='1.5'/>
    </inputs>
    <outputs>
        <data format="tabular" name="output" />
    </outputs>
    <tests>
        <test>
            <param name="inputfile" value='glimmer3/seqTest.fa'/>
            <param name='cutoff' value='1.5'/>
            <output name="output" file='glimmer3/longORFSTestOutput.dat'/>
        </test>
    </tests>
    <help>

**What it does**

    This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file. 
    These orfs are very likely to contain genes, and can be used as a set of training sequences 
    More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for
    each orf is the first possible one. The program, by default, automatically determines the
    value that maximizes the number of orfs that are output. With the -t option, the initial
    set of candidate orfs also can be filtered using entropy distance, which generally produces
    a larger, more accurate training set, particularly for high-GC-content genomes. 



-----

**Glimmer Overview**

::

**************		**************		**************		**************		
*            *		*	     *		*            *		*            *
* long-orfs  *  ===>	*   Extract  *	===>	* build-icm  *  ===>	*  glimmer3  *	
*            *		*	     *		*	     *  	*	     *	
**************		**************		**************		**************

-----

**Example**


* input::
 
	-Genome Sequence

	CELF22B7  C.aenorhabditis elegans (Bristol N2) cosmid F22B7
	GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT
	GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT
	TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT
	TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC
	GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA
	ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG
	AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA
	CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA
	TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC
	AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA
	GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC
	AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC
	CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA
	AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC
	GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT
	.....
	
	- Cutoff 1.5	

* output::

	Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat
	Excluded regions file = none
	Circular genome = true
	Initial minimum gene length = 90 bp
	Determine optimal min gene length to maximize number of genes
	Maximum overlap bases = 30
	Start codons = atg,gtg,ttg
	Stop codons = taa,tag,tga
	Sequence length = 40222
	Final minimum gene length = 97

	Putative Genes:
	00001   40137      52  +2   0.892
	00002    1319    1095  -3   0.654
	00003    1555    1391  -2   0.793
	00004    1953    2066  +3   1.078
	00005    2045    2146  +2   0.919
	00006    4463    4759  +2   0.985
	00007    6785    6582  -3   1.033
	00008    6862    7020  +1   0.915
	00009    7300    7488  +1   0.900
	00010    7463    7570  +2   0.912
	00011    8399    8527  +2   1.044
	00012   10652   10545  -3   0.895
	00013   12170   12066  -3   1.108
	00014   13891   13748  -2   0.998
	00015   14157   14044  -1   1.026
	00016   15285   15410  +3   0.928
	00017   15829   15704  -2   0.949

	....

-------

**References**

A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).


    </help>
</tool>