Mercurial > repos > bgruening > glimmer
view additional/glimmer3-long-orfs-wrapper.xml @ 6:a07c49839f31
Uploaded
author | bgruening |
---|---|
date | Sun, 09 Jun 2013 07:57:22 -0400 |
parents | |
children |
line wrap: on
line source
<tool id="glimmer_long-orfs" name="long ORFs" version="0.1"> <description>identify long, non-overlapping ORFs (glimmer)</description> <requirements> <requirement type="package" version="3.02b">glimmer</requirement> </requirements> <command> long-orfs -n -t $cutoff $inputfile $output 2>&1 </command> <inputs> <param name="inputfile" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/> <param name='cutoff' type='float' label='cutoff' value='1.5'/> </inputs> <outputs> <data format="tabular" name="output" /> </outputs> <tests> <test> <param name="inputfile" value='glimmer3/seqTest.fa'/> <param name='cutoff' value='1.5'/> <output name="output" file='glimmer3/longORFSTestOutput.dat'/> </test> </tests> <help> **What it does** This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file. These orfs are very likely to contain genes, and can be used as a set of training sequences More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for each orf is the first possible one. The program, by default, automatically determines the value that maximizes the number of orfs that are output. With the -t option, the initial set of candidate orfs also can be filtered using entropy distance, which generally produces a larger, more accurate training set, particularly for high-GC-content genomes. ----- **Glimmer Overview** :: ************** ************** ************** ************** * * * * * * * * * long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * * * * * * * * * ************** ************** ************** ************** ----- **Example** * input:: -Genome Sequence CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT ..... - Cutoff 1.5 * output:: Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat Excluded regions file = none Circular genome = true Initial minimum gene length = 90 bp Determine optimal min gene length to maximize number of genes Maximum overlap bases = 30 Start codons = atg,gtg,ttg Stop codons = taa,tag,tga Sequence length = 40222 Final minimum gene length = 97 Putative Genes: 00001 40137 52 +2 0.892 00002 1319 1095 -3 0.654 00003 1555 1391 -2 0.793 00004 1953 2066 +3 1.078 00005 2045 2146 +2 0.919 00006 4463 4759 +2 0.985 00007 6785 6582 -3 1.033 00008 6862 7020 +1 0.915 00009 7300 7488 +1 0.900 00010 7463 7570 +2 0.912 00011 8399 8527 +2 1.044 00012 10652 10545 -3 0.895 00013 12170 12066 -3 1.108 00014 13891 13748 -2 0.998 00015 14157 14044 -1 1.026 00016 15285 15410 +3 0.928 00017 15829 15704 -2 0.949 .... ------- **References** A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). </help> </tool>