# HG changeset patch
# User bgruening
# Date 1370894434 14400
# Node ID d2eee6e51790fa8139fd49b44140c748d57430e5
# Parent a07c49839f31db1c2c045f748ca4aa8b081f9b94
Uploaded
diff -r a07c49839f31 -r d2eee6e51790 additional/gbk2orf.xml
--- a/additional/gbk2orf.xml Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,212 +0,0 @@
-
- from a GenBank file
-
- gbk_to_orf.py
- -g $infile
- -a $aa_output
- -n $nc_output
- ##TODO translation table, can be extracted from genbank file directly
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-Read a GenBank file and export fasta formatted amino acid and CDS files.
-
-
------
-
-**Example**
- * input::
-
- Genebankfile
-
- LOCUS BA000030 9025608 bp DNA linear BCT 21-DEC-2007
- DEFINITION Streptomyces avermitilis MA-4680 DNA, complete genome.
- ACCESSION BA000030 AP005021-AP005050
- VERSION BA000030.3 GI:148878541
- DBLINK Project: 189
- KEYWORDS .
- SOURCE Streptomyces avermitilis MA-4680
- ORGANISM Streptomyces avermitilis MA-4680
- Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
- Streptomycineae; Streptomycetaceae; Streptomyces.
- REFERENCE 1
- AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C.,
- Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T.,
- Kikuchi,H., Shiba,T., Sakaki,Y. and Hattori,M.
- TITLE Genome sequence of an industrial microorganism Streptomyces
- avermitilis: deducing the ability of producing secondary
- metabolites
- JOURNAL Proc. Natl. Acad. Sci. U.S.A. 98 (21), 12215-12220 (2001)
- PUBMED 11572948
- REFERENCE 2
- AUTHORS Ikeda,H., Ishikawa,J., Hanamoto,A., Shinose,M., Kikuchi,H.,
- Shiba,T., Sakaki,Y., Hattori,M. and Omura,S.
- TITLE Complete genome sequence and comparative analysis of the industrial
- microorganism Streptomyces avermitilis
- JOURNAL Nat. Biotechnol. 21 (5), 526-531 (2003)
- PUBMED 12692562
- REFERENCE 3 (bases 1 to 9025608)
- AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C.,
- Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T.,
- Kushida,N., Shiba,T., Sakaki,Y. and Hattori,M.
- TITLE Direct Submission
- JOURNAL Submitted (29-MAR-2002) Contact:S Omura Kitasato University,
- Kitasato Institute for Life Sciences; 1-15-1 Kitasato, Sagamihara,
- Kanagawa 228-8555, Japan URL
- :http://avermitilis.ls.kitasato-u.ac.jp/
- COMMENT On Jun 15, 2007 this sequence version replaced gi:57546753.
- This work was done in collaboration with Haruo Ikeda(*1), Jun
- Ishikawa(*2), Akiharu Hanamoto(*3), Chigusa Takahashi(*3), Mayumi
- Shinose(*3), Hiroshi Horikawa(*4), Hidekazu Nakazawa(*4), Tomomi
- Osonoe(*4), Norihiro Kushida(*4), Hisashi Kikuchi(*4), Tadayoshi
- Shiba(*5), Yoshiyuki Sakaki(*6,*7), Masahira Hattori(*1,*7)
- and Satoshi Omura(*1,*3).
- Final finishing process and all annotation were done by H. Ikeda
- and J. Ishikawa.
- *1 Kitasato Institute for Life Sciences, Kitasato University *2
- National Institute of Infectious Diseases
- *3 The Kitasato Institute
- *4 National Institute of Technology and Evaluation *5 School of
- Science, Kitasato University
- *6 Institute of Medical Science, University of Tokyo *7 RIKEN,
- Genomic Sciences Center
- All the annotated genes identified are available from following
- urls.
- http://avermitilis.ls.kitasato-u.ac.jp.
- FEATURES Location/Qualifiers
- source 1..9025608
- /organism="Streptomyces avermitilis MA-4680"
- /mol_type="genomic DNA"
- /strain="MA-4680"
- /db_xref="taxon:227882"
- /note="This strain is also named as strain: ATCC 31267,
- NCIMB 12804 or NRRL 8165."
- gene complement(1380..1811)
- /locus_tag="SAV_1"
- CDS complement(1380..1811)
- /locus_tag="SAV_1"
- /codon_start=1
- /transl_table=11
- /product="hypothetical protein"
- /protein_id="BAC67710.1"
- /db_xref="GI:29603637"
- /translation="MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQA
- AAAAEDAALNYMPGVLARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARL
- MHTQEEKEAPPKSFKEKLRSALDGPQPPEPAGRPWKPGSET"
-
-
-* output::
-
- - aminoAcidOutput
- >SAV_1
- MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQAAAAAEDAALNYMPGVL
- ARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARLMHTQEEKEAPPKSFKEKL
- RSALDGPQPPEPAGRPWKPGSET
- >SAV_2
- VPPQGARGTIVSATGSGKTSMAAASTLNCFPEGRILVTVPTLDLLAQTAQAWRAVGHHSP
- MIAVCSLENDPVLNERT
- >SAV_3
- MDWNFPDDDIFFCGGCGDDDTPDPRVPRQDKALCVRCDRVERQVRRYRITVPRRNAIMRF
- QRDVCALCQEGPPTDHCPDAVSFWHIDHDHRCCPPGGSCGRCVRGLLCLPCNATRLPAYE
- RLPNVLRDSPRFNTYLNSPPARHPEARPTARDHAGPRDASSYLIDAFFTAADHPEGNALS
- S
- >SAV_4
- VALTPGGTRVTQWQDRQAIGDMHERRVAAALRARGWTVQPCGQGTYPPAVREALRRTRSA
- LRHFPDLIAARGADLITIDAKDRMPSTDTDRYAVSADTVTAGLFFTAAHAPTPLYYVFGD
- LKVLTPAEVVHYTAHALRHRSGAFHLVRTEQAHCFDDVFGSAGAAAAA
- >SAV_5
- MMLLMAAYVDPRFRPTLWPGTPVPTPELMPLRGARADGEWIVWTPQVRSRSHTVPVPEDF
- YLREFMEVDPEDLDAVAALMGAYGHLGGSINTGSWDVDVYERLKELTEREHPRAPFALHG
- ELATLFMREAQAAITTWLALRREGGLDALIEPEVSEEELAQWQASNADLEEAWPRDLDHL
- RELSLEIRISNLVSELNAALKPFSIGIGGLGDRYPTILAVAFLQLYNHLAEDATIRECAN
- ETCRRHFVRQRGRAAYGQNRTSGIKYCTRECARAQAQREHRRRRKQQTTTLQQPPAPGPQ
- SHDTSEPTAEGR
- >SAV_6
- MISLREHQVEANARIRAWAGFPTRSPVPAQGLRGTVVSATGSGKTITAAWAARECFRGGR
- ILVMVPTLDLLVQTAQAWRRVGHNGPMVAACSLEKDEVLEQLGVRTTTNPIQLALWAGHG
- PVVVFATYASLVDREDPEDVTGRAKVRGPLEAALAGGQRLYGQTMDGFDLAVVDEAHSTT
- GDLGRPWAAIHDNSRIPADFRLYLTATPRILASPRPQKGADGRELEIATMASDPDGPYGE
- WLFELGLSEAVERGILAGFEIDVLEIRDPSPALGESEEAQRGRRLALLQTALLEHAAARN
- LRTVMTFHQRVEEAAAFAQTMPQTAARLYEAEVSAEALVDAGALPESSIGAEFYELEAGR
- HVPPDRVWAAWLCGDHLVAERREVLRQFADGLDAGNKRVHRAFLASVRVLGEGVDIVGER
- GVEAICFADTRGSQVEIVQNIGRALRPNPDGTNKTARIIVPVFLQPGENPTDMVASASFA
- PLVTVLQGLRSHSERLVEQLASRALTSGQRHVHVKRDEDGRIIGTTTEGEGGQHESEGAV
- ESALLHFSTPRDATTIAAFLRTRVYRPESLVWLEGYQALLRWRKKNHITGLYAVPYDTET
- EAGVTKAFPLGRWVHQQRRTYRAGELDPHRTTLLDEAGMVWEPGDEAWENKLAALRSFHR
- AHGHLAPRRDAVWGDADSELVPVGEHMANLRRKDGLGKNPQRAATRATQLAAIDPDWNCP
- WPLDWQRHYRVLADLATDEPHSRLPDIQPGVQFEGDDLGKWLQRQRRSWAELSEEQQQRL
- TALGVTPAEPPTPTPSAKGGGKAAAFQRGLAALAQWIQREGAHKVVPRGHVEAVVIDGQE
- HQHKLGVWISNTKTRRDKLTHDQRTALAALGVEWA
- ....
-
- - orfs
-
- >SAV_1
- ATGACCGCCGAGTGGTACGTCCTCGTCGAAGAGGACACACGAGAGACCAAGCGCGCCGAC
- GGCGTTGAACTCAGATTGCACCGCTGGAAACTGGCGGCCACTCAGCACATCGCAGGAGAT
- CAGGAACAGGCCGCCGCCGCGGCCGAGGATGCGGCCCTGAACTACATGCCGGGAGTGCTC
- GCTCGGCATGCCCGACCGGGAGACGAACCGGCCCGGCATGCTTTCCTCACCCAGGACGGG
- GCCTGGCTGGTGCTCCTCAGGCAGCGGCACCGCGAGTGTCACATACGGGTGACCACTGCC
- CGGCTCATGCATACACAGGAAGAGAAGGAGGCCCCGCCGAAAAGCTTCAAGGAGAAACTC
- CGCAGCGCCCTGGATGGTCCTCAGCCGCCCGAACCGGCTGGTAGGCCATGGAAGCCGGGC
- AGCGAAACCTGA
- >SAV_2
- GTGCCCCCTCAGGGAGCCCGTGGCACGATCGTGTCAGCTACCGGGTCCGGCAAAACGAGC
- ATGGCCGCCGCGAGCACGCTGAACTGCTTCCCCGAAGGCCGGATCCTCGTGACCGTGCCG
- ACCCTGGACCTGCTCGCACAGACCGCCCAGGCGTGGCGGGCAGTCGGCCACCACTCCCCC
- ATGATCGCGGTGTGCTCGCTGGAGAACGACCCAGTGCTGAACGAGCGGACCTGA
- >SAV_3
- ATGGACTGGAACTTCCCCGACGACGACATCTTCTTCTGCGGCGGGTGCGGCGACGACGAC
- ACCCCCGACCCGCGGGTCCCGCGTCAGGACAAGGCCCTGTGCGTCCGCTGCGACAGAGTC
- GAACGGCAGGTCCGCCGATACCGGATCACCGTGCCGCGGAGGAACGCGATCATGCGCTTC
- CAGCGCGACGTCTGCGCCCTGTGCCAGGAAGGCCCGCCGACCGACCACTGCCCCGATGCC
- GTCAGCTTCTGGCACATCGACCACGACCACCGCTGCTGCCCTCCCGGCGGCTCATGCGGG
- CGGTGCGTCCGCGGCCTCCTGTGCCTGCCCTGCAACGCCACCCGCCTGCCCGCCTACGAA
- CGCCTCCCCAACGTCCTCCGCGACAGCCCTCGCTTCAACACCTACCTCAACAGCCCACCC
- GCCCGGCACCCCGAAGCCCGCCCCACCGCCAGGGACCATGCAGGCCCCCGCGACGCATCC
- AGCTACCTCATCGACGCCTTTTTCACCGCCGCGGACCATCCCGAGGGGAACGCCCTCAGC
- TCCTGA
- >SAV_4
- GTGGCACTTACCCCAGGGGGAACCCGAGTGACGCAGTGGCAGGACCGCCAGGCGATAGGC
- GACATGCACGAACGTCGGGTGGCGGCCGCGCTGCGCGCCCGCGGCTGGACCGTCCAGCCC
- TGCGGACAGGGCACCTACCCGCCCGCCGTACGGGAAGCCCTGCGCCGGACCCGCTCCGCC
- CTGCGGCACTTCCCCGACCTCATCGCCGCCCGCGGCGCCGACCTGATCACCATCGACGCC
- AAGGACCGCATGCCCAGCACCGACACCGACCGCTACGCCGTCAGCGCCGACACCGTGACC
- GCCGGCCTCTTTTTCACCGCGGCCCACGCTCCGACTCCGCTGTACTACGTCTTCGGCGAC
- CTGAAGGTCCTCACGCCGGCGGAGGTGGTCCACTACACCGCTCACGCCTTGCGCCACCGC
- AGCGGTGCCTTCCACCTCGTACGCACGGAGCAAGCACACTGCTTCGACGACGTCTTCGGA
- TCGGCTGGCGCAGCAGCTGCGGCATGA
- >SAV_5
- ATGATGCTCCTCATGGCGGCATACGTTGACCCACGCTTTCGTCCTACGCTATGGCCTGGA
- ACGCCCGTGCCGACACCGGAGTTGATGCCTCTTCGCGGAGCGCGGGCCGACGGTGAATGG
- ATCGTCTGGACCCCGCAGGTCCGCTCCCGCTCGCACACGGTCCCCGTGCCGGAGGACTTC
- TACCTGCGCGAGTTCATGGAGGTCGACCCTGAGGACCTCGACGCCGTGGCCGCCCTGATG
- GGCGCCTACGGACACCTCGGCGGGAGCATCAACACCGGAAGCTGGGACGTCGACGTCTAC
- GAGCGCCTCAAGGAGCTCACGGAGCGCGAACACCCCCGCGCGCCGTTCGCCCTGCACGGC
- GAACTGGCCACGCTGTTCATGAGGGAGGCGCAGGCGGCCATCACCACCTGGCTGGCCCTG
- CGCCGCGAGGGCGGGCTCGACGCGCTCATCGAGCCCGAGGTGTCCGAGGAAGAACTGGCG
- CAGTGGCAAGCGAGCAACGCTGATCTTGAGGAAGCGTGGCCGCGGGACCTGGACCACCTG
- CGCGAACTCTCCCTGGAGATCAGGATCAGCAACCTCGTGAGCGAACTGAACGCCGCGCTG
- AAGCCGTTCAGCATCGGCATCGGCGGCCTGGGCGACCGCTACCCCACCATCCTCGCTGTG
- GCGTTCCTCCAGCTCTACAACCACCTCGCCGAGGACGCCACGATCCGCGAGTGCGCGAAC
- GAGACCTGCCGCCGCCACTTCGTACGCCAGCGCGGCCGCGCCGCATACGGGCAGAACCGC
- ACCAGCGGCATCAAGTACTGCACCCGCGAATGCGCCCGCGCCCAGGCCCAGCGCGAACAC
- CGCCGGCGCCGCAAACAGCAGACCACGACCCTCCAGCAGCCGCCGGCGCCTGGTCCTCAG
- TCTCACGACACCTCAGAGCCGACTGCCGAAGGGCGCTGA
- .......
-
-
-
diff -r a07c49839f31 -r d2eee6e51790 additional/gbk_to_orf.py
--- a/additional/gbk_to_orf.py Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,61 +0,0 @@
-#!/usr/bin/env python
-
-###################################################################
-##
-## gbk2orf.py by Errol Strain (estrain@gmail.com)
-##
-## Read a GenBank file and export fasta formatted amino acid and
-## CDS files
-##
-###################################################################
-
-import sys
-from optparse import OptionParser
-from Bio import SeqIO
-from Bio.Seq import Seq
-from Bio.SeqRecord import SeqRecord
-
-
-## Command line usage
-usage = "usage: %prog -g input.gbk -a aa.fasta -n nuc.fasta"
-p = OptionParser(usage)
-p.add_option("-t","--translate", dest="transtabl",type="int",default=11,
- help="Translation table used to translate coding regions (default=11)")
-p.add_option("-g","--genbank", dest="gb_file",help="GenBank input file")
-p.add_option("-a","--amino_acid", dest="aa_file",help="Fasta amino acid output")
-p.add_option("-n","--nucleotide", dest="orf_file",help="Fasta nucleotide output")
-(opts, args) = p.parse_args()
-## Do I need this next line?
-if not opts and not args : p.error("Use --help to see usage")
-if len(sys.argv)==1 : p.error("Use --help to see usage")
-
-## Lists to hold SeqRecords
-aalist = []
-nuclist = []
-
-## If the CDS does not have a locus tag the name will be assigned using the
-## order in which it was found
-feat_count=0
-
-## Iterate through genbank records in input file
-for gb_record in SeqIO.parse(open(opts.gb_file,"r"), "genbank") :
- for (index, feature) in enumerate(gb_record.features) :
- if feature.type=="CDS" :
- feat_count = feat_count + 1
- gene = feature.extract(gb_record.seq)
- if "locus_tag" in feature.qualifiers :
- value = feature.qualifiers["locus_tag"][0]
- else :
- value = "Index_" + str(feat_count)
- nuclist.append(SeqRecord(Seq(str(gene)),id=value,name=value))
- pro=Seq(str(gene.translate(table=opts.transtabl,to_stop=True)))
- aalist.append(SeqRecord(pro,id=value,name=value))
-
-## Write out lists in fasta format
-aa_handle = open(opts.aa_file,"w")
-SeqIO.write(aalist,aa_handle,"fasta")
-aa_handle.close()
-orf_handle = open(opts.orf_file,"w")
-SeqIO.write(nuclist,orf_handle,"fasta")
-orf_handle.close()
-
diff -r a07c49839f31 -r d2eee6e51790 additional/glimmer2gff.py
--- a/additional/glimmer2gff.py Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,36 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Input: Glimmer3 prediction
-Output: GFF3 file
-Return a GFF3 file with the genes predicted by Glimmer3
-Bjoern Gruening
-
-Note: Its not a full-fledged GFF3 file, its a really simple one.
-
-"""
-
-import sys, re
-
-def __main__():
- input_file = open(sys.argv[1], 'r')
-
- print '##gff-version 3\n'
- for line in input_file:
- line = line.strip()
- if line[0] == '>':
- header = line[1:]
- else:
- (id, start, end, frame, score) = re.split('\s+', line)
- if int(end) > int(start):
- strand = '+'
- else:
- strand = '-'
- (start, end) = (end, start)
-
- rest = 'frame=%s;score=%s' % (frame, score)
- print '\t'.join([header, 'glimmer_prediction', 'predicted_gene', start, end, '.', strand, '.', rest])
-
-
-if __name__ == "__main__" :
- __main__()
diff -r a07c49839f31 -r d2eee6e51790 additional/glimmer2gff.xml
--- a/additional/glimmer2gff.xml Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,63 +0,0 @@
-
- Converts Glimmer Files to GFF Files
-
- glimmer2gff.py
- $input > $output
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Converts a Glimmer3 output File to an GFF Annotation File::
-
-**Example**
-
-Input::
- >contig00097 sbe.0.234
- orf00003 2869 497 -2 5.60
- orf00005 3894 2875 -1 7.05
- orf00007 4242 4826 +3 8.04
- orf00010 4846 5403 +1 8.57
- orf00012 6858 5413 -1 10.87
- orf00013 6857 7594 +2 3.61
- orf00014 7751 9232 +2 11.34
- orf00015 9374 10357 +2 10.66
- orf00017 10603 11196 +1 13.39
- orf00021 11303 11911 +2 8.81
- orf00025 14791 12050 -2 13.51
- orf00026 15216 16199 +3 6.37
- orf00028 16333 16935 +1 8.86
-
-
-Output:
- contig00097 sbe.0.234 glimmer gene 497 2869 . - . -2 5.60
- contig00097 sbe.0.234 glimmer gene 2875 3894 . - . -1 7.05
- contig00097 sbe.0.234 glimmer gene 4242 4826 . + . +3 8.04
- contig00097 sbe.0.234 glimmer gene 4846 5403 . + . +1 8.57
- contig00097 sbe.0.234 glimmer gene 5413 6858 . - . -1 10.87
- contig00097 sbe.0.234 glimmer gene 6857 7594 . + . +2 3.61
- contig00097 sbe.0.234 glimmer gene 7751 9232 . + . +2 11.34
- contig00097 sbe.0.234 glimmer gene 9374 10357 . + . +2 10.66
- contig00097 sbe.0.234 glimmer gene 10603 11196 . + . +1 13.39
- contig00097 sbe.0.234 glimmer gene 11303 11911 . + . +2 8.81
- contig00097 sbe.0.234 glimmer gene 12050 14791 . - . -2 13.51
- contig00097 sbe.0.234 glimmer gene 15216 16199 . + . +3 6.37
- contig00097 sbe.0.234 glimmer gene 16333 16935 . + . +1 8.86
-
-
------
-
-
-
-
diff -r a07c49839f31 -r d2eee6e51790 additional/glimmer3-extract-wrapper.xml
--- a/additional/glimmer3-extract-wrapper.xml Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,127 +0,0 @@
-
diff -r a07c49839f31 -r d2eee6e51790 additional/glimmer3-long-orfs-wrapper.xml
--- a/additional/glimmer3-long-orfs-wrapper.xml Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,125 +0,0 @@
-
- identify long, non-overlapping ORFs (glimmer)
-
- glimmer
-
-
- long-orfs
- -n -t
- $cutoff
- $inputfile
- $output
- 2>&1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
- This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file.
- These orfs are very likely to contain genes, and can be used as a set of training sequences
- More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for
- each orf is the first possible one. The program, by default, automatically determines the
- value that maximizes the number of orfs that are output. With the -t option, the initial
- set of candidate orfs also can be filtered using entropy distance, which generally produces
- a larger, more accurate training set, particularly for high-GC-content genomes.
-
-
-
------
-
-**Glimmer Overview**
-
-::
-
-************** ************** ************** **************
-* * * * * * * *
-* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 *
-* * * * * * * *
-************** ************** ************** **************
-
------
-
-**Example**
-
-
-* input::
-
- -Genome Sequence
-
- CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7
- GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT
- GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT
- TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT
- TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC
- GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA
- ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG
- AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA
- CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA
- TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC
- AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA
- GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC
- AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC
- CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA
- AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC
- GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT
- .....
-
- - Cutoff 1.5
-
-* output::
-
- Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat
- Excluded regions file = none
- Circular genome = true
- Initial minimum gene length = 90 bp
- Determine optimal min gene length to maximize number of genes
- Maximum overlap bases = 30
- Start codons = atg,gtg,ttg
- Stop codons = taa,tag,tga
- Sequence length = 40222
- Final minimum gene length = 97
-
- Putative Genes:
- 00001 40137 52 +2 0.892
- 00002 1319 1095 -3 0.654
- 00003 1555 1391 -2 0.793
- 00004 1953 2066 +3 1.078
- 00005 2045 2146 +2 0.919
- 00006 4463 4759 +2 0.985
- 00007 6785 6582 -3 1.033
- 00008 6862 7020 +1 0.915
- 00009 7300 7488 +1 0.900
- 00010 7463 7570 +2 0.912
- 00011 8399 8527 +2 1.044
- 00012 10652 10545 -3 0.895
- 00013 12170 12066 -3 1.108
- 00014 13891 13748 -2 0.998
- 00015 14157 14044 -1 1.026
- 00016 15285 15410 +3 0.928
- 00017 15829 15704 -2 0.949
-
- ....
-
--------
-
-**References**
-
-A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).
-
-
-
-
diff -r a07c49839f31 -r d2eee6e51790 additional/glimmer_acgt_content.xml
--- a/additional/glimmer_acgt_content.xml Sun Jun 09 07:57:22 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,55 +0,0 @@
-
- of windows in each sequence
-
- glimmer
-
-
- window-acgt
- $percentage
- $input_win_len
- $input_win_skip
- < $infile > $output
-
- ##TODO prettify the output
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-This tool calculates the ACGT-Content from a given Sequence, given a sliding window.
-
--------
-
-**Output**
-
-Output is in the format:
-
- window-start window-len A's C's G's T's #other %GC
-
-Note the last window in the sequence can be shorter than *window-len* if the sequence ends prematurely
-
-
-
-
-**References**
-
-A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).
-
-
-
-