changeset 7:d2eee6e51790

Uploaded
author bgruening
date Mon, 10 Jun 2013 16:00:34 -0400
parents a07c49839f31
children ec5cf10b8db7
files additional/gbk2orf.xml additional/gbk_to_orf.py additional/glimmer2gff.py additional/glimmer2gff.xml additional/glimmer3-extract-wrapper.xml additional/glimmer3-long-orfs-wrapper.xml additional/glimmer_acgt_content.xml
diffstat 7 files changed, 0 insertions(+), 679 deletions(-) [+]
line wrap: on
line diff
--- a/additional/gbk2orf.xml	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,212 +0,0 @@
-<tool id="gbkToORF" name="Extract ORF" version="0.1">
-    <description>from a GenBank file</description>
-    <command interpreter="python">
-        gbk_to_orf.py
-            -g $infile
-            -a $aa_output
-            -n $nc_output
-            ##TODO translation table, can be extracted from genbank file directly
-    </command>
-    <inputs>
-        <param name="infile" type='data' format="genbank" label="gene bank file"/>
-    </inputs>
-    <outputs>
-        <data name="aa_output" format="fasta" />
-        <data name="nc_output" format="fasta" />
-    </outputs>
-    <tests>
-        <test>
-        </test>
-    </tests>
-    <help>
-
-
-**What it does**
-Read a GenBank file and export fasta formatted amino acid and CDS files.
-
-
------
-
-**Example**
-	* input::
-	
-		Genebankfile
-
-			LOCUS       BA000030             9025608 bp    DNA     linear   BCT 21-DEC-2007
-		DEFINITION  Streptomyces avermitilis MA-4680 DNA, complete genome.
-		ACCESSION   BA000030 AP005021-AP005050
-		VERSION     BA000030.3  GI:148878541
-		DBLINK      Project: 189
-		KEYWORDS    .
-		SOURCE      Streptomyces avermitilis MA-4680
-		  ORGANISM  Streptomyces avermitilis MA-4680
-			    Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
-			    Streptomycineae; Streptomycetaceae; Streptomyces.
-		REFERENCE   1
-		  AUTHORS   Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C.,
-			    Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T.,
-			    Kikuchi,H., Shiba,T., Sakaki,Y. and Hattori,M.
-		  TITLE     Genome sequence of an industrial microorganism Streptomyces
-			    avermitilis: deducing the ability of producing secondary
-			    metabolites
-		  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 98 (21), 12215-12220 (2001)
-		   PUBMED   11572948
-		REFERENCE   2
-		  AUTHORS   Ikeda,H., Ishikawa,J., Hanamoto,A., Shinose,M., Kikuchi,H.,
-			    Shiba,T., Sakaki,Y., Hattori,M. and Omura,S.
-		  TITLE     Complete genome sequence and comparative analysis of the industrial
-			    microorganism Streptomyces avermitilis
-		  JOURNAL   Nat. Biotechnol. 21 (5), 526-531 (2003)
-		   PUBMED   12692562
-		REFERENCE   3  (bases 1 to 9025608)
-		  AUTHORS   Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C.,
-			    Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T.,
-			    Kushida,N., Shiba,T., Sakaki,Y. and Hattori,M.
-		  TITLE     Direct Submission
-		  JOURNAL   Submitted (29-MAR-2002) Contact:S Omura Kitasato University,
-			    Kitasato Institute for Life Sciences; 1-15-1 Kitasato, Sagamihara,
-			    Kanagawa 228-8555, Japan URL
-			    :http://avermitilis.ls.kitasato-u.ac.jp/
-		COMMENT     On Jun 15, 2007 this sequence version replaced gi:57546753.
-			    This work was done in collaboration with Haruo Ikeda(*1), Jun
-			    Ishikawa(*2), Akiharu Hanamoto(*3), Chigusa Takahashi(*3), Mayumi
-			    Shinose(*3), Hiroshi Horikawa(*4), Hidekazu Nakazawa(*4), Tomomi
-			    Osonoe(*4), Norihiro Kushida(*4), Hisashi Kikuchi(*4), Tadayoshi
-			    Shiba(*5), Yoshiyuki Sakaki(*6,*7), Masahira Hattori(*1,*7)
-			    and Satoshi Omura(*1,*3).
-			    Final finishing process and all annotation were done by H. Ikeda
-			    and J. Ishikawa.
-			    *1 Kitasato Institute for Life Sciences, Kitasato University *2
-			    National Institute of Infectious Diseases
-			    *3 The Kitasato Institute
-			    *4 National Institute of Technology and Evaluation *5 School of
-			    Science, Kitasato University
-			    *6 Institute of Medical Science, University of Tokyo *7 RIKEN,
-			    Genomic Sciences Center
-			    All the annotated genes identified are available from following
-			    urls.
-			    http://avermitilis.ls.kitasato-u.ac.jp.
-		FEATURES             Location/Qualifiers
-		     source          1..9025608
-				     /organism="Streptomyces avermitilis MA-4680"
-				     /mol_type="genomic DNA"
-				     /strain="MA-4680"
-				     /db_xref="taxon:227882"
-				     /note="This strain is also named as strain: ATCC 31267,
-				     NCIMB 12804 or NRRL 8165."
-		     gene            complement(1380..1811)
-				     /locus_tag="SAV_1"
-		     CDS             complement(1380..1811)
-				     /locus_tag="SAV_1"
-				     /codon_start=1
-				     /transl_table=11
-				     /product="hypothetical protein"
-				     /protein_id="BAC67710.1"
-				     /db_xref="GI:29603637"
-				     /translation="MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQA
-				     AAAAEDAALNYMPGVLARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARL
-				     MHTQEEKEAPPKSFKEKLRSALDGPQPPEPAGRPWKPGSET"
-
-
-* output::
-	
-	-  aminoAcidOutput
-	>SAV_1 
-	MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQAAAAAEDAALNYMPGVL
-	ARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARLMHTQEEKEAPPKSFKEKL
-	RSALDGPQPPEPAGRPWKPGSET
-	>SAV_2 
-	VPPQGARGTIVSATGSGKTSMAAASTLNCFPEGRILVTVPTLDLLAQTAQAWRAVGHHSP
-	MIAVCSLENDPVLNERT
-	>SAV_3 
-	MDWNFPDDDIFFCGGCGDDDTPDPRVPRQDKALCVRCDRVERQVRRYRITVPRRNAIMRF
-	QRDVCALCQEGPPTDHCPDAVSFWHIDHDHRCCPPGGSCGRCVRGLLCLPCNATRLPAYE
-	RLPNVLRDSPRFNTYLNSPPARHPEARPTARDHAGPRDASSYLIDAFFTAADHPEGNALS
-	S
-	>SAV_4 
-	VALTPGGTRVTQWQDRQAIGDMHERRVAAALRARGWTVQPCGQGTYPPAVREALRRTRSA
-	LRHFPDLIAARGADLITIDAKDRMPSTDTDRYAVSADTVTAGLFFTAAHAPTPLYYVFGD
-	LKVLTPAEVVHYTAHALRHRSGAFHLVRTEQAHCFDDVFGSAGAAAAA
-	>SAV_5 
-	MMLLMAAYVDPRFRPTLWPGTPVPTPELMPLRGARADGEWIVWTPQVRSRSHTVPVPEDF
-	YLREFMEVDPEDLDAVAALMGAYGHLGGSINTGSWDVDVYERLKELTEREHPRAPFALHG
-	ELATLFMREAQAAITTWLALRREGGLDALIEPEVSEEELAQWQASNADLEEAWPRDLDHL
-	RELSLEIRISNLVSELNAALKPFSIGIGGLGDRYPTILAVAFLQLYNHLAEDATIRECAN
-	ETCRRHFVRQRGRAAYGQNRTSGIKYCTRECARAQAQREHRRRRKQQTTTLQQPPAPGPQ
-	SHDTSEPTAEGR
-	>SAV_6 
-	MISLREHQVEANARIRAWAGFPTRSPVPAQGLRGTVVSATGSGKTITAAWAARECFRGGR
-	ILVMVPTLDLLVQTAQAWRRVGHNGPMVAACSLEKDEVLEQLGVRTTTNPIQLALWAGHG
-	PVVVFATYASLVDREDPEDVTGRAKVRGPLEAALAGGQRLYGQTMDGFDLAVVDEAHSTT
-	GDLGRPWAAIHDNSRIPADFRLYLTATPRILASPRPQKGADGRELEIATMASDPDGPYGE
-	WLFELGLSEAVERGILAGFEIDVLEIRDPSPALGESEEAQRGRRLALLQTALLEHAAARN
-	LRTVMTFHQRVEEAAAFAQTMPQTAARLYEAEVSAEALVDAGALPESSIGAEFYELEAGR
-	HVPPDRVWAAWLCGDHLVAERREVLRQFADGLDAGNKRVHRAFLASVRVLGEGVDIVGER
-	GVEAICFADTRGSQVEIVQNIGRALRPNPDGTNKTARIIVPVFLQPGENPTDMVASASFA
-	PLVTVLQGLRSHSERLVEQLASRALTSGQRHVHVKRDEDGRIIGTTTEGEGGQHESEGAV
-	ESALLHFSTPRDATTIAAFLRTRVYRPESLVWLEGYQALLRWRKKNHITGLYAVPYDTET
-	EAGVTKAFPLGRWVHQQRRTYRAGELDPHRTTLLDEAGMVWEPGDEAWENKLAALRSFHR
-	AHGHLAPRRDAVWGDADSELVPVGEHMANLRRKDGLGKNPQRAATRATQLAAIDPDWNCP
-	WPLDWQRHYRVLADLATDEPHSRLPDIQPGVQFEGDDLGKWLQRQRRSWAELSEEQQQRL
-	TALGVTPAEPPTPTPSAKGGGKAAAFQRGLAALAQWIQREGAHKVVPRGHVEAVVIDGQE
-	HQHKLGVWISNTKTRRDKLTHDQRTALAALGVEWA
-	....
-
-	- orfs
-
-	>SAV_1 
-	ATGACCGCCGAGTGGTACGTCCTCGTCGAAGAGGACACACGAGAGACCAAGCGCGCCGAC
-	GGCGTTGAACTCAGATTGCACCGCTGGAAACTGGCGGCCACTCAGCACATCGCAGGAGAT
-	CAGGAACAGGCCGCCGCCGCGGCCGAGGATGCGGCCCTGAACTACATGCCGGGAGTGCTC
-	GCTCGGCATGCCCGACCGGGAGACGAACCGGCCCGGCATGCTTTCCTCACCCAGGACGGG
-	GCCTGGCTGGTGCTCCTCAGGCAGCGGCACCGCGAGTGTCACATACGGGTGACCACTGCC
-	CGGCTCATGCATACACAGGAAGAGAAGGAGGCCCCGCCGAAAAGCTTCAAGGAGAAACTC
-	CGCAGCGCCCTGGATGGTCCTCAGCCGCCCGAACCGGCTGGTAGGCCATGGAAGCCGGGC
-	AGCGAAACCTGA
-	>SAV_2 
-	GTGCCCCCTCAGGGAGCCCGTGGCACGATCGTGTCAGCTACCGGGTCCGGCAAAACGAGC
-	ATGGCCGCCGCGAGCACGCTGAACTGCTTCCCCGAAGGCCGGATCCTCGTGACCGTGCCG
-	ACCCTGGACCTGCTCGCACAGACCGCCCAGGCGTGGCGGGCAGTCGGCCACCACTCCCCC
-	ATGATCGCGGTGTGCTCGCTGGAGAACGACCCAGTGCTGAACGAGCGGACCTGA
-	>SAV_3 
-	ATGGACTGGAACTTCCCCGACGACGACATCTTCTTCTGCGGCGGGTGCGGCGACGACGAC
-	ACCCCCGACCCGCGGGTCCCGCGTCAGGACAAGGCCCTGTGCGTCCGCTGCGACAGAGTC
-	GAACGGCAGGTCCGCCGATACCGGATCACCGTGCCGCGGAGGAACGCGATCATGCGCTTC
-	CAGCGCGACGTCTGCGCCCTGTGCCAGGAAGGCCCGCCGACCGACCACTGCCCCGATGCC
-	GTCAGCTTCTGGCACATCGACCACGACCACCGCTGCTGCCCTCCCGGCGGCTCATGCGGG
-	CGGTGCGTCCGCGGCCTCCTGTGCCTGCCCTGCAACGCCACCCGCCTGCCCGCCTACGAA
-	CGCCTCCCCAACGTCCTCCGCGACAGCCCTCGCTTCAACACCTACCTCAACAGCCCACCC
-	GCCCGGCACCCCGAAGCCCGCCCCACCGCCAGGGACCATGCAGGCCCCCGCGACGCATCC
-	AGCTACCTCATCGACGCCTTTTTCACCGCCGCGGACCATCCCGAGGGGAACGCCCTCAGC
-	TCCTGA
-	>SAV_4 
-	GTGGCACTTACCCCAGGGGGAACCCGAGTGACGCAGTGGCAGGACCGCCAGGCGATAGGC
-	GACATGCACGAACGTCGGGTGGCGGCCGCGCTGCGCGCCCGCGGCTGGACCGTCCAGCCC
-	TGCGGACAGGGCACCTACCCGCCCGCCGTACGGGAAGCCCTGCGCCGGACCCGCTCCGCC
-	CTGCGGCACTTCCCCGACCTCATCGCCGCCCGCGGCGCCGACCTGATCACCATCGACGCC
-	AAGGACCGCATGCCCAGCACCGACACCGACCGCTACGCCGTCAGCGCCGACACCGTGACC
-	GCCGGCCTCTTTTTCACCGCGGCCCACGCTCCGACTCCGCTGTACTACGTCTTCGGCGAC
-	CTGAAGGTCCTCACGCCGGCGGAGGTGGTCCACTACACCGCTCACGCCTTGCGCCACCGC
-	AGCGGTGCCTTCCACCTCGTACGCACGGAGCAAGCACACTGCTTCGACGACGTCTTCGGA
-	TCGGCTGGCGCAGCAGCTGCGGCATGA
-	>SAV_5 
-	ATGATGCTCCTCATGGCGGCATACGTTGACCCACGCTTTCGTCCTACGCTATGGCCTGGA
-	ACGCCCGTGCCGACACCGGAGTTGATGCCTCTTCGCGGAGCGCGGGCCGACGGTGAATGG
-	ATCGTCTGGACCCCGCAGGTCCGCTCCCGCTCGCACACGGTCCCCGTGCCGGAGGACTTC
-	TACCTGCGCGAGTTCATGGAGGTCGACCCTGAGGACCTCGACGCCGTGGCCGCCCTGATG
-	GGCGCCTACGGACACCTCGGCGGGAGCATCAACACCGGAAGCTGGGACGTCGACGTCTAC
-	GAGCGCCTCAAGGAGCTCACGGAGCGCGAACACCCCCGCGCGCCGTTCGCCCTGCACGGC
-	GAACTGGCCACGCTGTTCATGAGGGAGGCGCAGGCGGCCATCACCACCTGGCTGGCCCTG
-	CGCCGCGAGGGCGGGCTCGACGCGCTCATCGAGCCCGAGGTGTCCGAGGAAGAACTGGCG
-	CAGTGGCAAGCGAGCAACGCTGATCTTGAGGAAGCGTGGCCGCGGGACCTGGACCACCTG
-	CGCGAACTCTCCCTGGAGATCAGGATCAGCAACCTCGTGAGCGAACTGAACGCCGCGCTG
-	AAGCCGTTCAGCATCGGCATCGGCGGCCTGGGCGACCGCTACCCCACCATCCTCGCTGTG
-	GCGTTCCTCCAGCTCTACAACCACCTCGCCGAGGACGCCACGATCCGCGAGTGCGCGAAC
-	GAGACCTGCCGCCGCCACTTCGTACGCCAGCGCGGCCGCGCCGCATACGGGCAGAACCGC
-	ACCAGCGGCATCAAGTACTGCACCCGCGAATGCGCCCGCGCCCAGGCCCAGCGCGAACAC
-	CGCCGGCGCCGCAAACAGCAGACCACGACCCTCCAGCAGCCGCCGGCGCCTGGTCCTCAG
-	TCTCACGACACCTCAGAGCCGACTGCCGAAGGGCGCTGA
-	.......
-
-    </help>
-</tool>
--- a/additional/gbk_to_orf.py	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,61 +0,0 @@
-#!/usr/bin/env python
-
-###################################################################
-##
-## gbk2orf.py by Errol Strain (estrain@gmail.com)
-##
-## Read a GenBank file and export fasta formatted amino acid and 
-## CDS files
-##
-###################################################################
-
-import sys
-from optparse import OptionParser
-from Bio import SeqIO
-from Bio.Seq import Seq
-from Bio.SeqRecord import SeqRecord
-
-
-## Command line usage
-usage = "usage: %prog -g input.gbk -a aa.fasta -n nuc.fasta" 
-p = OptionParser(usage)
-p.add_option("-t","--translate", dest="transtabl",type="int",default=11,
-  help="Translation table used to translate coding regions (default=11)")
-p.add_option("-g","--genbank", dest="gb_file",help="GenBank input file")
-p.add_option("-a","--amino_acid", dest="aa_file",help="Fasta amino acid output")
-p.add_option("-n","--nucleotide", dest="orf_file",help="Fasta nucleotide output")
-(opts, args) = p.parse_args()
-## Do I need this next line?
-if not opts and not args : p.error("Use --help to see usage")
-if len(sys.argv)==1 : p.error("Use --help to see usage") 
-
-## Lists to hold SeqRecords
-aalist = []
-nuclist = []
-
-## If the CDS does not have a locus tag the name will be assigned using the
-## order in which it was found
-feat_count=0
-
-## Iterate through genbank records in input file
-for gb_record in SeqIO.parse(open(opts.gb_file,"r"), "genbank") :
-  for (index, feature) in enumerate(gb_record.features) :
-    if feature.type=="CDS" :
-      feat_count = feat_count + 1
-      gene = feature.extract(gb_record.seq)
-      if "locus_tag" in feature.qualifiers :
-        value = feature.qualifiers["locus_tag"][0]
-      else :
-        value =  "Index_" + str(feat_count)
-      nuclist.append(SeqRecord(Seq(str(gene)),id=value,name=value))
-      pro=Seq(str(gene.translate(table=opts.transtabl,to_stop=True)))
-      aalist.append(SeqRecord(pro,id=value,name=value))
-
-## Write out lists in fasta format
-aa_handle = open(opts.aa_file,"w")
-SeqIO.write(aalist,aa_handle,"fasta")
-aa_handle.close()
-orf_handle = open(opts.orf_file,"w")
-SeqIO.write(nuclist,orf_handle,"fasta")
-orf_handle.close()
-
--- a/additional/glimmer2gff.py	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,36 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Input: Glimmer3 prediction
-Output: GFF3 file
-Return a GFF3 file with the genes predicted by Glimmer3
-Bjoern Gruening
-
-Note: Its not a full-fledged GFF3 file, its a really simple one.
-
-"""
-
-import sys, re
-
-def __main__():
-    input_file = open(sys.argv[1], 'r')
-
-    print '##gff-version 3\n'
-    for line in input_file:
-        line = line.strip()
-        if line[0] == '>':
-            header = line[1:]
-        else:
-            (id, start, end, frame, score) = re.split('\s+', line)
-            if int(end) > int(start):
-                strand = '+'
-            else:
-                strand = '-'
-                (start, end) = (end, start)
-
-            rest = 'frame=%s;score=%s' % (frame, score)
-            print '\t'.join([header, 'glimmer_prediction', 'predicted_gene', start, end, '.', strand, '.', rest])
-
-
-if __name__ == "__main__" :
-    __main__()
--- a/additional/glimmer2gff.xml	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,63 +0,0 @@
-<tool id="glimmer2gff" name="Convert Glimmer to GFF" version="0.1">
-    <description>Converts Glimmer Files to GFF Files</description>
-    <command interpreter="python">
-        glimmer2gff.py
-            $input > $output
-    </command>
-    <inputs>
-        <param name="input" type="data" format="tabular" label="Glimmer Output File"/>
-    </inputs>
-    <outputs>
-        <data name="output" type="data" format="gff"/>
-    </outputs>
-    <tests>
-        <test>
-
-        </test>
-    </tests>
-    <help>
-
-**What it does**
-
-Converts a Glimmer3 output File to an GFF Annotation File::
-
-**Example**
-
-Input::
-    >contig00097 sbe.0.234 
-    orf00003     2869      497  -2     5.60
-    orf00005     3894     2875  -1     7.05
-    orf00007     4242     4826  +3     8.04
-    orf00010     4846     5403  +1     8.57
-    orf00012     6858     5413  -1    10.87
-    orf00013     6857     7594  +2     3.61
-    orf00014     7751     9232  +2    11.34
-    orf00015     9374    10357  +2    10.66
-    orf00017    10603    11196  +1    13.39
-    orf00021    11303    11911  +2     8.81
-    orf00025    14791    12050  -2    13.51
-    orf00026    15216    16199  +3     6.37
-    orf00028    16333    16935  +1     8.86
-
-
-Output:
-    contig00097 sbe.0.234	glimmer	gene	497	2869	.	-	.	-2     5.60
-    contig00097 sbe.0.234	glimmer	gene	2875	3894	.	-	.	-1     7.05
-    contig00097 sbe.0.234	glimmer	gene	4242	4826	.	+	.	+3     8.04
-    contig00097 sbe.0.234	glimmer	gene	4846	5403	.	+	.	+1     8.57
-    contig00097 sbe.0.234	glimmer	gene	5413	6858	.	-	.	-1    10.87
-    contig00097 sbe.0.234	glimmer	gene	6857	7594	.	+	.	+2     3.61
-    contig00097 sbe.0.234	glimmer	gene	7751	9232	.	+	.	+2    11.34
-    contig00097 sbe.0.234	glimmer	gene	9374	10357	.	+	.	+2    10.66
-    contig00097 sbe.0.234	glimmer	gene	10603	11196	.	+	.	+1    13.39
-    contig00097 sbe.0.234	glimmer	gene	11303	11911	.	+	.	+2     8.81
-    contig00097 sbe.0.234	glimmer	gene	12050	14791	.	-	.	-2    13.51
-    contig00097 sbe.0.234	glimmer	gene	15216	16199	.	+	.	+3     6.37
-    contig00097 sbe.0.234	glimmer	gene	16333	16935	.	+	.	+1     8.86
-
-
------
-
-
-    </help>
-</tool>
--- a/additional/glimmer3-extract-wrapper.xml	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,127 +0,0 @@
-<tool id="glimmer_extract" name="glimmer3-extract" version="0.1">
-    <description></description>
-    <requirements>
-        <requirement type="package" version="3.02b">glimmer</requirement>
-    </requirements>
-    <command>
-    extract
-        -t
-        $seqInput
-        $cordInput > $output
-        2> /dev/null
-    </command>
-    <inputs>
-        <param name="seqInput" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/>
-        <param name="cordInput" type="data"  label="Coordinates" help="Dataset missing? See TIP below"/>
-    </inputs>
-    <outputs>
-        <data format="fasta" name="output" />
-    </outputs>
-    <tests>
-        <test>
-            <param name="seqInput" value='glimmer3/seqTest.fa'/>
-            <param name="cordInput" value='glimmer3/cordTest.txt'/>
-            <output name="output" file='glimmer3/extractTestOutput.dat'/>
-        </test>
-    </tests>
-
-    <help>
-
-**What it does**
-
-	This program reads a genome sequence and a list of coordinates for it and outputs a multi-
-	fasta file of the regions specified by the coordinates.
-
------
-
-**Glimmer Overview**
-
-::
-
-**************		**************		**************		**************		
-*            *		*	     *		*            *		*            *
-* long-orfs  *  ===>	*   Extract  *	===>	* build-icm  *  ===>	*  glimmer3  *	
-*            *		*	     *		*	     *  	*	     *	
-**************		**************		**************		**************
-
------
-
-**Example**
-
-
-* input ::
-	
-	-Genome Sequence
-
-	CELF22B7  C.aenorhabditis elegans (Bristol N2) cosmid F22B7
-	GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT
-	GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT
-	TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT
-	TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC
-	GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA
-	ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG
-	AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA
-	CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA
-	TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC
-	AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA
-	GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC
-	AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC
-	CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA
-	AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC
-	GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT
-	...
-
-	- Coorinates 		
-
-	00001   40137      52  +2   0.892
-	00002    1319    1095  -3   0.654
-	00003    1555    1391  -2   0.793
-	00004    1953    2066  +3   1.078
-	00005    2045    2146  +2   0.919
-	00006    4463    4759  +2   0.985
-	00007    6785    6582  -3   1.033
-	00008    6862    7020  +1   0.915
-	00009    7300    7488  +1   0.900
-	00010    7463    7570  +2   0.912
-	00011    8399    8527  +2   1.044
-	00012   10652   10545  -3   0.895
-	00013   12170   12066  -3   1.108
-	00014   13891   13748  -2   0.998
-	00015   14157   14044  -1   1.026
-	00016   15285   15410  +3   0.928
-	00017   15829   15704  -2   0.949
-	...	
-
-* output::
- 
-		>00001  40137 52  len=135
-		ATGACACATTTGCTCGTTGCTTTGACCCACTACGAGGCCAGTATCATGATTTCTAGAAAA
-		ACCCTCTTTTTGACTTCTTCCTCCATGATCCTTGTAGATTTTGAATTTGAAGTTTTTTCT
-		CATTCCAAAACTCTG
-	
-		>00002  1319 1095  len=222
-		TTGGCTCGCCGTTTTGGAGTCCGTGCTGGAATGCCTGGCTTCATCTCAAATAAACTTTGT
-		CCGAGTCTAACGATTGTTCCAGGAAATTACCCTAAATACACTAAAGTCAGTCGCCAATTT
-		TCACAAATTTTCATGGAATACGATTCGGATGTTGGAATGATGTCATTGGATGAGGCATTT
-		ATAGATTTGACAGACTATGTGGCAAGTAATACAGAAAAAAGT
-	
-		>00003  1555 1391  len=162
-		ATGGAGAATCTTGAGATGAAACTGGAATCATCTAGAGATTTATCAAGAGACTGTGTTTGT
-		ATAGATATGGATGCTTATTTTGCCGCAGTTGAAATGAGAGATAATCCTGCACTGAGAACA
-		GTTCCTATGGCCGTAGGCTCATCGGCAATGCTGGTAAGCACC
-	
-		>00004  1953 2066  len=111
-		GTGCGCGAGAAAAAACTACGCGTTAACCGCCAATTTTCACTTCCCCACAGATCTGTCTCG
-		AGATTCTCGAGTCATTTTTCAAGTTTATTTGTTTGTCAGCGGTTGTTTTAT
-		.....
-
--------
-
-**References**
-
-A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).
-
-
-
-    </help>
-</tool>
--- a/additional/glimmer3-long-orfs-wrapper.xml	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,125 +0,0 @@
-<tool id="glimmer_long-orfs" name="long ORFs" version="0.1">
-    <description>identify long, non-overlapping ORFs (glimmer)</description>
-    <requirements>
-        <requirement type="package" version="3.02b">glimmer</requirement>
-    </requirements>
-    <command>
-    long-orfs
-        -n -t
-        $cutoff
-        $inputfile
-        $output
-        2>&#38;1
-    </command>
-    <inputs>
-        <param name="inputfile" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/>
-        <param name='cutoff' type='float' label='cutoff' value='1.5'/>
-    </inputs>
-    <outputs>
-        <data format="tabular" name="output" />
-    </outputs>
-    <tests>
-        <test>
-            <param name="inputfile" value='glimmer3/seqTest.fa'/>
-            <param name='cutoff' value='1.5'/>
-            <output name="output" file='glimmer3/longORFSTestOutput.dat'/>
-        </test>
-    </tests>
-    <help>
-
-**What it does**
-
-    This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file. 
-    These orfs are very likely to contain genes, and can be used as a set of training sequences 
-    More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for
-    each orf is the first possible one. The program, by default, automatically determines the
-    value that maximizes the number of orfs that are output. With the -t option, the initial
-    set of candidate orfs also can be filtered using entropy distance, which generally produces
-    a larger, more accurate training set, particularly for high-GC-content genomes. 
-
-
-
------
-
-**Glimmer Overview**
-
-::
-
-**************		**************		**************		**************		
-*            *		*	     *		*            *		*            *
-* long-orfs  *  ===>	*   Extract  *	===>	* build-icm  *  ===>	*  glimmer3  *	
-*            *		*	     *		*	     *  	*	     *	
-**************		**************		**************		**************
-
------
-
-**Example**
-
-
-* input::
- 
-	-Genome Sequence
-
-	CELF22B7  C.aenorhabditis elegans (Bristol N2) cosmid F22B7
-	GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT
-	GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT
-	TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT
-	TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC
-	GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA
-	ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG
-	AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA
-	CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA
-	TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC
-	AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA
-	GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC
-	AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC
-	CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA
-	AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC
-	GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT
-	.....
-	
-	- Cutoff 1.5	
-
-* output::
-
-	Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat
-	Excluded regions file = none
-	Circular genome = true
-	Initial minimum gene length = 90 bp
-	Determine optimal min gene length to maximize number of genes
-	Maximum overlap bases = 30
-	Start codons = atg,gtg,ttg
-	Stop codons = taa,tag,tga
-	Sequence length = 40222
-	Final minimum gene length = 97
-
-	Putative Genes:
-	00001   40137      52  +2   0.892
-	00002    1319    1095  -3   0.654
-	00003    1555    1391  -2   0.793
-	00004    1953    2066  +3   1.078
-	00005    2045    2146  +2   0.919
-	00006    4463    4759  +2   0.985
-	00007    6785    6582  -3   1.033
-	00008    6862    7020  +1   0.915
-	00009    7300    7488  +1   0.900
-	00010    7463    7570  +2   0.912
-	00011    8399    8527  +2   1.044
-	00012   10652   10545  -3   0.895
-	00013   12170   12066  -3   1.108
-	00014   13891   13748  -2   0.998
-	00015   14157   14044  -1   1.026
-	00016   15285   15410  +3   0.928
-	00017   15829   15704  -2   0.949
-
-	....
-
--------
-
-**References**
-
-A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).
-
-
-    </help>
-</tool>
--- a/additional/glimmer_acgt_content.xml	Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,55 +0,0 @@
-<tool id="glimmer_acgt-content" name="ACGT Content" version="0.1">
-    <description>of windows in each sequence</description>
-    <requirements>
-        <requirement type="package" version="3.02b">glimmer</requirement>
-    </requirements>
-    <command>
-        window-acgt
-            $percentage
-            $input_win_len
-            $input_win_skip
-            &lt; $infile > $output
-
-            ##TODO prettify the output
-    </command>
-    <inputs>
-        <param name="infile" type="data" format="fasta" label="Genome Sequence"/>
-        <param name="input_win_len" type="integer" value="10" label="The width of the sliding window"/>
-        <param name="input_win_skip" type="integer" value="10" label="The number of positions between windows to report"/>
-        <param name="percentage" type="boolean" truevalue="-p" falsevalue="" checked="true" label="Report percentages instead of counts"/>
-    </inputs>
-    <outputs>
-        <data name="output" format="tabular"/>
-    </outputs>
-    <tests>
-        <test>
-            <param name="infile" value="streptomyces_coelicolor.dna" />
-            <output name="output" file="fasta_tool_convert_from_dna.out" />
-        </test>
-    </tests>
-    <help>
-
-**What it does**
-
-This tool calculates the ACGT-Content from a given Sequence, given a sliding window.
-
--------
-
-**Output**
-
-Output is in the format:
-
-	window-start	window-len	A's	C's	G's	T's	#other	%GC
-
-Note the last window in the sequence can be shorter than *window-len* if the sequence ends prematurely
-
-
-
-
-**References**
-
-A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).
-
-
-    </help>
-</tool>