Mercurial > repos > bgruening > glimmer
changeset 7:d2eee6e51790
Uploaded
author | bgruening |
---|---|
date | Mon, 10 Jun 2013 16:00:34 -0400 |
parents | a07c49839f31 |
children | ec5cf10b8db7 |
files | additional/gbk2orf.xml additional/gbk_to_orf.py additional/glimmer2gff.py additional/glimmer2gff.xml additional/glimmer3-extract-wrapper.xml additional/glimmer3-long-orfs-wrapper.xml additional/glimmer_acgt_content.xml |
diffstat | 7 files changed, 0 insertions(+), 679 deletions(-) [+] |
line wrap: on
line diff
--- a/additional/gbk2orf.xml Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,212 +0,0 @@ -<tool id="gbkToORF" name="Extract ORF" version="0.1"> - <description>from a GenBank file</description> - <command interpreter="python"> - gbk_to_orf.py - -g $infile - -a $aa_output - -n $nc_output - ##TODO translation table, can be extracted from genbank file directly - </command> - <inputs> - <param name="infile" type='data' format="genbank" label="gene bank file"/> - </inputs> - <outputs> - <data name="aa_output" format="fasta" /> - <data name="nc_output" format="fasta" /> - </outputs> - <tests> - <test> - </test> - </tests> - <help> - - -**What it does** -Read a GenBank file and export fasta formatted amino acid and CDS files. - - ------ - -**Example** - * input:: - - Genebankfile - - LOCUS BA000030 9025608 bp DNA linear BCT 21-DEC-2007 - DEFINITION Streptomyces avermitilis MA-4680 DNA, complete genome. - ACCESSION BA000030 AP005021-AP005050 - VERSION BA000030.3 GI:148878541 - DBLINK Project: 189 - KEYWORDS . - SOURCE Streptomyces avermitilis MA-4680 - ORGANISM Streptomyces avermitilis MA-4680 - Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; - Streptomycineae; Streptomycetaceae; Streptomyces. - REFERENCE 1 - AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C., - Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T., - Kikuchi,H., Shiba,T., Sakaki,Y. and Hattori,M. - TITLE Genome sequence of an industrial microorganism Streptomyces - avermitilis: deducing the ability of producing secondary - metabolites - JOURNAL Proc. Natl. Acad. Sci. U.S.A. 98 (21), 12215-12220 (2001) - PUBMED 11572948 - REFERENCE 2 - AUTHORS Ikeda,H., Ishikawa,J., Hanamoto,A., Shinose,M., Kikuchi,H., - Shiba,T., Sakaki,Y., Hattori,M. and Omura,S. - TITLE Complete genome sequence and comparative analysis of the industrial - microorganism Streptomyces avermitilis - JOURNAL Nat. Biotechnol. 21 (5), 526-531 (2003) - PUBMED 12692562 - REFERENCE 3 (bases 1 to 9025608) - AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C., - Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T., - Kushida,N., Shiba,T., Sakaki,Y. and Hattori,M. - TITLE Direct Submission - JOURNAL Submitted (29-MAR-2002) Contact:S Omura Kitasato University, - Kitasato Institute for Life Sciences; 1-15-1 Kitasato, Sagamihara, - Kanagawa 228-8555, Japan URL - :http://avermitilis.ls.kitasato-u.ac.jp/ - COMMENT On Jun 15, 2007 this sequence version replaced gi:57546753. - This work was done in collaboration with Haruo Ikeda(*1), Jun - Ishikawa(*2), Akiharu Hanamoto(*3), Chigusa Takahashi(*3), Mayumi - Shinose(*3), Hiroshi Horikawa(*4), Hidekazu Nakazawa(*4), Tomomi - Osonoe(*4), Norihiro Kushida(*4), Hisashi Kikuchi(*4), Tadayoshi - Shiba(*5), Yoshiyuki Sakaki(*6,*7), Masahira Hattori(*1,*7) - and Satoshi Omura(*1,*3). - Final finishing process and all annotation were done by H. Ikeda - and J. Ishikawa. - *1 Kitasato Institute for Life Sciences, Kitasato University *2 - National Institute of Infectious Diseases - *3 The Kitasato Institute - *4 National Institute of Technology and Evaluation *5 School of - Science, Kitasato University - *6 Institute of Medical Science, University of Tokyo *7 RIKEN, - Genomic Sciences Center - All the annotated genes identified are available from following - urls. - http://avermitilis.ls.kitasato-u.ac.jp. - FEATURES Location/Qualifiers - source 1..9025608 - /organism="Streptomyces avermitilis MA-4680" - /mol_type="genomic DNA" - /strain="MA-4680" - /db_xref="taxon:227882" - /note="This strain is also named as strain: ATCC 31267, - NCIMB 12804 or NRRL 8165." - gene complement(1380..1811) - /locus_tag="SAV_1" - CDS complement(1380..1811) - /locus_tag="SAV_1" - /codon_start=1 - /transl_table=11 - /product="hypothetical protein" - /protein_id="BAC67710.1" - /db_xref="GI:29603637" - /translation="MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQA - AAAAEDAALNYMPGVLARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARL - MHTQEEKEAPPKSFKEKLRSALDGPQPPEPAGRPWKPGSET" - - -* output:: - - - aminoAcidOutput - >SAV_1 - MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQAAAAAEDAALNYMPGVL - ARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARLMHTQEEKEAPPKSFKEKL - RSALDGPQPPEPAGRPWKPGSET - >SAV_2 - VPPQGARGTIVSATGSGKTSMAAASTLNCFPEGRILVTVPTLDLLAQTAQAWRAVGHHSP - MIAVCSLENDPVLNERT - >SAV_3 - MDWNFPDDDIFFCGGCGDDDTPDPRVPRQDKALCVRCDRVERQVRRYRITVPRRNAIMRF - QRDVCALCQEGPPTDHCPDAVSFWHIDHDHRCCPPGGSCGRCVRGLLCLPCNATRLPAYE - RLPNVLRDSPRFNTYLNSPPARHPEARPTARDHAGPRDASSYLIDAFFTAADHPEGNALS - S - >SAV_4 - VALTPGGTRVTQWQDRQAIGDMHERRVAAALRARGWTVQPCGQGTYPPAVREALRRTRSA - LRHFPDLIAARGADLITIDAKDRMPSTDTDRYAVSADTVTAGLFFTAAHAPTPLYYVFGD - LKVLTPAEVVHYTAHALRHRSGAFHLVRTEQAHCFDDVFGSAGAAAAA - >SAV_5 - MMLLMAAYVDPRFRPTLWPGTPVPTPELMPLRGARADGEWIVWTPQVRSRSHTVPVPEDF - YLREFMEVDPEDLDAVAALMGAYGHLGGSINTGSWDVDVYERLKELTEREHPRAPFALHG - ELATLFMREAQAAITTWLALRREGGLDALIEPEVSEEELAQWQASNADLEEAWPRDLDHL - RELSLEIRISNLVSELNAALKPFSIGIGGLGDRYPTILAVAFLQLYNHLAEDATIRECAN - ETCRRHFVRQRGRAAYGQNRTSGIKYCTRECARAQAQREHRRRRKQQTTTLQQPPAPGPQ - SHDTSEPTAEGR - >SAV_6 - MISLREHQVEANARIRAWAGFPTRSPVPAQGLRGTVVSATGSGKTITAAWAARECFRGGR - ILVMVPTLDLLVQTAQAWRRVGHNGPMVAACSLEKDEVLEQLGVRTTTNPIQLALWAGHG - PVVVFATYASLVDREDPEDVTGRAKVRGPLEAALAGGQRLYGQTMDGFDLAVVDEAHSTT - GDLGRPWAAIHDNSRIPADFRLYLTATPRILASPRPQKGADGRELEIATMASDPDGPYGE - WLFELGLSEAVERGILAGFEIDVLEIRDPSPALGESEEAQRGRRLALLQTALLEHAAARN - LRTVMTFHQRVEEAAAFAQTMPQTAARLYEAEVSAEALVDAGALPESSIGAEFYELEAGR - HVPPDRVWAAWLCGDHLVAERREVLRQFADGLDAGNKRVHRAFLASVRVLGEGVDIVGER - GVEAICFADTRGSQVEIVQNIGRALRPNPDGTNKTARIIVPVFLQPGENPTDMVASASFA - PLVTVLQGLRSHSERLVEQLASRALTSGQRHVHVKRDEDGRIIGTTTEGEGGQHESEGAV - ESALLHFSTPRDATTIAAFLRTRVYRPESLVWLEGYQALLRWRKKNHITGLYAVPYDTET - EAGVTKAFPLGRWVHQQRRTYRAGELDPHRTTLLDEAGMVWEPGDEAWENKLAALRSFHR - AHGHLAPRRDAVWGDADSELVPVGEHMANLRRKDGLGKNPQRAATRATQLAAIDPDWNCP - WPLDWQRHYRVLADLATDEPHSRLPDIQPGVQFEGDDLGKWLQRQRRSWAELSEEQQQRL - TALGVTPAEPPTPTPSAKGGGKAAAFQRGLAALAQWIQREGAHKVVPRGHVEAVVIDGQE - HQHKLGVWISNTKTRRDKLTHDQRTALAALGVEWA - .... - - - orfs - - >SAV_1 - ATGACCGCCGAGTGGTACGTCCTCGTCGAAGAGGACACACGAGAGACCAAGCGCGCCGAC - GGCGTTGAACTCAGATTGCACCGCTGGAAACTGGCGGCCACTCAGCACATCGCAGGAGAT - CAGGAACAGGCCGCCGCCGCGGCCGAGGATGCGGCCCTGAACTACATGCCGGGAGTGCTC - GCTCGGCATGCCCGACCGGGAGACGAACCGGCCCGGCATGCTTTCCTCACCCAGGACGGG - GCCTGGCTGGTGCTCCTCAGGCAGCGGCACCGCGAGTGTCACATACGGGTGACCACTGCC - CGGCTCATGCATACACAGGAAGAGAAGGAGGCCCCGCCGAAAAGCTTCAAGGAGAAACTC - CGCAGCGCCCTGGATGGTCCTCAGCCGCCCGAACCGGCTGGTAGGCCATGGAAGCCGGGC - AGCGAAACCTGA - >SAV_2 - GTGCCCCCTCAGGGAGCCCGTGGCACGATCGTGTCAGCTACCGGGTCCGGCAAAACGAGC - ATGGCCGCCGCGAGCACGCTGAACTGCTTCCCCGAAGGCCGGATCCTCGTGACCGTGCCG - ACCCTGGACCTGCTCGCACAGACCGCCCAGGCGTGGCGGGCAGTCGGCCACCACTCCCCC - ATGATCGCGGTGTGCTCGCTGGAGAACGACCCAGTGCTGAACGAGCGGACCTGA - >SAV_3 - ATGGACTGGAACTTCCCCGACGACGACATCTTCTTCTGCGGCGGGTGCGGCGACGACGAC - ACCCCCGACCCGCGGGTCCCGCGTCAGGACAAGGCCCTGTGCGTCCGCTGCGACAGAGTC - GAACGGCAGGTCCGCCGATACCGGATCACCGTGCCGCGGAGGAACGCGATCATGCGCTTC - CAGCGCGACGTCTGCGCCCTGTGCCAGGAAGGCCCGCCGACCGACCACTGCCCCGATGCC - GTCAGCTTCTGGCACATCGACCACGACCACCGCTGCTGCCCTCCCGGCGGCTCATGCGGG - CGGTGCGTCCGCGGCCTCCTGTGCCTGCCCTGCAACGCCACCCGCCTGCCCGCCTACGAA - CGCCTCCCCAACGTCCTCCGCGACAGCCCTCGCTTCAACACCTACCTCAACAGCCCACCC - GCCCGGCACCCCGAAGCCCGCCCCACCGCCAGGGACCATGCAGGCCCCCGCGACGCATCC - AGCTACCTCATCGACGCCTTTTTCACCGCCGCGGACCATCCCGAGGGGAACGCCCTCAGC - TCCTGA - >SAV_4 - GTGGCACTTACCCCAGGGGGAACCCGAGTGACGCAGTGGCAGGACCGCCAGGCGATAGGC - GACATGCACGAACGTCGGGTGGCGGCCGCGCTGCGCGCCCGCGGCTGGACCGTCCAGCCC - TGCGGACAGGGCACCTACCCGCCCGCCGTACGGGAAGCCCTGCGCCGGACCCGCTCCGCC - CTGCGGCACTTCCCCGACCTCATCGCCGCCCGCGGCGCCGACCTGATCACCATCGACGCC - AAGGACCGCATGCCCAGCACCGACACCGACCGCTACGCCGTCAGCGCCGACACCGTGACC - GCCGGCCTCTTTTTCACCGCGGCCCACGCTCCGACTCCGCTGTACTACGTCTTCGGCGAC - CTGAAGGTCCTCACGCCGGCGGAGGTGGTCCACTACACCGCTCACGCCTTGCGCCACCGC - AGCGGTGCCTTCCACCTCGTACGCACGGAGCAAGCACACTGCTTCGACGACGTCTTCGGA - TCGGCTGGCGCAGCAGCTGCGGCATGA - >SAV_5 - ATGATGCTCCTCATGGCGGCATACGTTGACCCACGCTTTCGTCCTACGCTATGGCCTGGA - ACGCCCGTGCCGACACCGGAGTTGATGCCTCTTCGCGGAGCGCGGGCCGACGGTGAATGG - ATCGTCTGGACCCCGCAGGTCCGCTCCCGCTCGCACACGGTCCCCGTGCCGGAGGACTTC - TACCTGCGCGAGTTCATGGAGGTCGACCCTGAGGACCTCGACGCCGTGGCCGCCCTGATG - GGCGCCTACGGACACCTCGGCGGGAGCATCAACACCGGAAGCTGGGACGTCGACGTCTAC - GAGCGCCTCAAGGAGCTCACGGAGCGCGAACACCCCCGCGCGCCGTTCGCCCTGCACGGC - GAACTGGCCACGCTGTTCATGAGGGAGGCGCAGGCGGCCATCACCACCTGGCTGGCCCTG - CGCCGCGAGGGCGGGCTCGACGCGCTCATCGAGCCCGAGGTGTCCGAGGAAGAACTGGCG - CAGTGGCAAGCGAGCAACGCTGATCTTGAGGAAGCGTGGCCGCGGGACCTGGACCACCTG - CGCGAACTCTCCCTGGAGATCAGGATCAGCAACCTCGTGAGCGAACTGAACGCCGCGCTG - AAGCCGTTCAGCATCGGCATCGGCGGCCTGGGCGACCGCTACCCCACCATCCTCGCTGTG - GCGTTCCTCCAGCTCTACAACCACCTCGCCGAGGACGCCACGATCCGCGAGTGCGCGAAC - GAGACCTGCCGCCGCCACTTCGTACGCCAGCGCGGCCGCGCCGCATACGGGCAGAACCGC - ACCAGCGGCATCAAGTACTGCACCCGCGAATGCGCCCGCGCCCAGGCCCAGCGCGAACAC - CGCCGGCGCCGCAAACAGCAGACCACGACCCTCCAGCAGCCGCCGGCGCCTGGTCCTCAG - TCTCACGACACCTCAGAGCCGACTGCCGAAGGGCGCTGA - ....... - - </help> -</tool>
--- a/additional/gbk_to_orf.py Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,61 +0,0 @@ -#!/usr/bin/env python - -################################################################### -## -## gbk2orf.py by Errol Strain (estrain@gmail.com) -## -## Read a GenBank file and export fasta formatted amino acid and -## CDS files -## -################################################################### - -import sys -from optparse import OptionParser -from Bio import SeqIO -from Bio.Seq import Seq -from Bio.SeqRecord import SeqRecord - - -## Command line usage -usage = "usage: %prog -g input.gbk -a aa.fasta -n nuc.fasta" -p = OptionParser(usage) -p.add_option("-t","--translate", dest="transtabl",type="int",default=11, - help="Translation table used to translate coding regions (default=11)") -p.add_option("-g","--genbank", dest="gb_file",help="GenBank input file") -p.add_option("-a","--amino_acid", dest="aa_file",help="Fasta amino acid output") -p.add_option("-n","--nucleotide", dest="orf_file",help="Fasta nucleotide output") -(opts, args) = p.parse_args() -## Do I need this next line? -if not opts and not args : p.error("Use --help to see usage") -if len(sys.argv)==1 : p.error("Use --help to see usage") - -## Lists to hold SeqRecords -aalist = [] -nuclist = [] - -## If the CDS does not have a locus tag the name will be assigned using the -## order in which it was found -feat_count=0 - -## Iterate through genbank records in input file -for gb_record in SeqIO.parse(open(opts.gb_file,"r"), "genbank") : - for (index, feature) in enumerate(gb_record.features) : - if feature.type=="CDS" : - feat_count = feat_count + 1 - gene = feature.extract(gb_record.seq) - if "locus_tag" in feature.qualifiers : - value = feature.qualifiers["locus_tag"][0] - else : - value = "Index_" + str(feat_count) - nuclist.append(SeqRecord(Seq(str(gene)),id=value,name=value)) - pro=Seq(str(gene.translate(table=opts.transtabl,to_stop=True))) - aalist.append(SeqRecord(pro,id=value,name=value)) - -## Write out lists in fasta format -aa_handle = open(opts.aa_file,"w") -SeqIO.write(aalist,aa_handle,"fasta") -aa_handle.close() -orf_handle = open(opts.orf_file,"w") -SeqIO.write(nuclist,orf_handle,"fasta") -orf_handle.close() -
--- a/additional/glimmer2gff.py Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,36 +0,0 @@ -#!/usr/bin/env python - -""" -Input: Glimmer3 prediction -Output: GFF3 file -Return a GFF3 file with the genes predicted by Glimmer3 -Bjoern Gruening - -Note: Its not a full-fledged GFF3 file, its a really simple one. - -""" - -import sys, re - -def __main__(): - input_file = open(sys.argv[1], 'r') - - print '##gff-version 3\n' - for line in input_file: - line = line.strip() - if line[0] == '>': - header = line[1:] - else: - (id, start, end, frame, score) = re.split('\s+', line) - if int(end) > int(start): - strand = '+' - else: - strand = '-' - (start, end) = (end, start) - - rest = 'frame=%s;score=%s' % (frame, score) - print '\t'.join([header, 'glimmer_prediction', 'predicted_gene', start, end, '.', strand, '.', rest]) - - -if __name__ == "__main__" : - __main__()
--- a/additional/glimmer2gff.xml Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,63 +0,0 @@ -<tool id="glimmer2gff" name="Convert Glimmer to GFF" version="0.1"> - <description>Converts Glimmer Files to GFF Files</description> - <command interpreter="python"> - glimmer2gff.py - $input > $output - </command> - <inputs> - <param name="input" type="data" format="tabular" label="Glimmer Output File"/> - </inputs> - <outputs> - <data name="output" type="data" format="gff"/> - </outputs> - <tests> - <test> - - </test> - </tests> - <help> - -**What it does** - -Converts a Glimmer3 output File to an GFF Annotation File:: - -**Example** - -Input:: - >contig00097 sbe.0.234 - orf00003 2869 497 -2 5.60 - orf00005 3894 2875 -1 7.05 - orf00007 4242 4826 +3 8.04 - orf00010 4846 5403 +1 8.57 - orf00012 6858 5413 -1 10.87 - orf00013 6857 7594 +2 3.61 - orf00014 7751 9232 +2 11.34 - orf00015 9374 10357 +2 10.66 - orf00017 10603 11196 +1 13.39 - orf00021 11303 11911 +2 8.81 - orf00025 14791 12050 -2 13.51 - orf00026 15216 16199 +3 6.37 - orf00028 16333 16935 +1 8.86 - - -Output: - contig00097 sbe.0.234 glimmer gene 497 2869 . - . -2 5.60 - contig00097 sbe.0.234 glimmer gene 2875 3894 . - . -1 7.05 - contig00097 sbe.0.234 glimmer gene 4242 4826 . + . +3 8.04 - contig00097 sbe.0.234 glimmer gene 4846 5403 . + . +1 8.57 - contig00097 sbe.0.234 glimmer gene 5413 6858 . - . -1 10.87 - contig00097 sbe.0.234 glimmer gene 6857 7594 . + . +2 3.61 - contig00097 sbe.0.234 glimmer gene 7751 9232 . + . +2 11.34 - contig00097 sbe.0.234 glimmer gene 9374 10357 . + . +2 10.66 - contig00097 sbe.0.234 glimmer gene 10603 11196 . + . +1 13.39 - contig00097 sbe.0.234 glimmer gene 11303 11911 . + . +2 8.81 - contig00097 sbe.0.234 glimmer gene 12050 14791 . - . -2 13.51 - contig00097 sbe.0.234 glimmer gene 15216 16199 . + . +3 6.37 - contig00097 sbe.0.234 glimmer gene 16333 16935 . + . +1 8.86 - - ------ - - - </help> -</tool>
--- a/additional/glimmer3-extract-wrapper.xml Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,127 +0,0 @@ -<tool id="glimmer_extract" name="glimmer3-extract" version="0.1"> - <description></description> - <requirements> - <requirement type="package" version="3.02b">glimmer</requirement> - </requirements> - <command> - extract - -t - $seqInput - $cordInput > $output - 2> /dev/null - </command> - <inputs> - <param name="seqInput" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/> - <param name="cordInput" type="data" label="Coordinates" help="Dataset missing? See TIP below"/> - </inputs> - <outputs> - <data format="fasta" name="output" /> - </outputs> - <tests> - <test> - <param name="seqInput" value='glimmer3/seqTest.fa'/> - <param name="cordInput" value='glimmer3/cordTest.txt'/> - <output name="output" file='glimmer3/extractTestOutput.dat'/> - </test> - </tests> - - <help> - -**What it does** - - This program reads a genome sequence and a list of coordinates for it and outputs a multi- - fasta file of the regions specified by the coordinates. - ------ - -**Glimmer Overview** - -:: - -************** ************** ************** ************** -* * * * * * * * -* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * -* * * * * * * * -************** ************** ************** ************** - ------ - -**Example** - - -* input :: - - -Genome Sequence - - CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 - GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT - GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT - TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT - TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC - GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA - ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG - AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA - CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA - TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC - AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA - GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC - AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC - CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA - AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC - GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT - ... - - - Coorinates - - 00001 40137 52 +2 0.892 - 00002 1319 1095 -3 0.654 - 00003 1555 1391 -2 0.793 - 00004 1953 2066 +3 1.078 - 00005 2045 2146 +2 0.919 - 00006 4463 4759 +2 0.985 - 00007 6785 6582 -3 1.033 - 00008 6862 7020 +1 0.915 - 00009 7300 7488 +1 0.900 - 00010 7463 7570 +2 0.912 - 00011 8399 8527 +2 1.044 - 00012 10652 10545 -3 0.895 - 00013 12170 12066 -3 1.108 - 00014 13891 13748 -2 0.998 - 00015 14157 14044 -1 1.026 - 00016 15285 15410 +3 0.928 - 00017 15829 15704 -2 0.949 - ... - -* output:: - - >00001 40137 52 len=135 - ATGACACATTTGCTCGTTGCTTTGACCCACTACGAGGCCAGTATCATGATTTCTAGAAAA - ACCCTCTTTTTGACTTCTTCCTCCATGATCCTTGTAGATTTTGAATTTGAAGTTTTTTCT - CATTCCAAAACTCTG - - >00002 1319 1095 len=222 - TTGGCTCGCCGTTTTGGAGTCCGTGCTGGAATGCCTGGCTTCATCTCAAATAAACTTTGT - CCGAGTCTAACGATTGTTCCAGGAAATTACCCTAAATACACTAAAGTCAGTCGCCAATTT - TCACAAATTTTCATGGAATACGATTCGGATGTTGGAATGATGTCATTGGATGAGGCATTT - ATAGATTTGACAGACTATGTGGCAAGTAATACAGAAAAAAGT - - >00003 1555 1391 len=162 - ATGGAGAATCTTGAGATGAAACTGGAATCATCTAGAGATTTATCAAGAGACTGTGTTTGT - ATAGATATGGATGCTTATTTTGCCGCAGTTGAAATGAGAGATAATCCTGCACTGAGAACA - GTTCCTATGGCCGTAGGCTCATCGGCAATGCTGGTAAGCACC - - >00004 1953 2066 len=111 - GTGCGCGAGAAAAAACTACGCGTTAACCGCCAATTTTCACTTCCCCACAGATCTGTCTCG - AGATTCTCGAGTCATTTTTCAAGTTTATTTGTTTGTCAGCGGTTGTTTTAT - ..... - -------- - -**References** - -A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). - - - - </help> -</tool>
--- a/additional/glimmer3-long-orfs-wrapper.xml Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,125 +0,0 @@ -<tool id="glimmer_long-orfs" name="long ORFs" version="0.1"> - <description>identify long, non-overlapping ORFs (glimmer)</description> - <requirements> - <requirement type="package" version="3.02b">glimmer</requirement> - </requirements> - <command> - long-orfs - -n -t - $cutoff - $inputfile - $output - 2>&1 - </command> - <inputs> - <param name="inputfile" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/> - <param name='cutoff' type='float' label='cutoff' value='1.5'/> - </inputs> - <outputs> - <data format="tabular" name="output" /> - </outputs> - <tests> - <test> - <param name="inputfile" value='glimmer3/seqTest.fa'/> - <param name='cutoff' value='1.5'/> - <output name="output" file='glimmer3/longORFSTestOutput.dat'/> - </test> - </tests> - <help> - -**What it does** - - This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file. - These orfs are very likely to contain genes, and can be used as a set of training sequences - More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for - each orf is the first possible one. The program, by default, automatically determines the - value that maximizes the number of orfs that are output. With the -t option, the initial - set of candidate orfs also can be filtered using entropy distance, which generally produces - a larger, more accurate training set, particularly for high-GC-content genomes. - - - ------ - -**Glimmer Overview** - -:: - -************** ************** ************** ************** -* * * * * * * * -* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * -* * * * * * * * -************** ************** ************** ************** - ------ - -**Example** - - -* input:: - - -Genome Sequence - - CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 - GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT - GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT - TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT - TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC - GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA - ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG - AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA - CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA - TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC - AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA - GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC - AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC - CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA - AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC - GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT - ..... - - - Cutoff 1.5 - -* output:: - - Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat - Excluded regions file = none - Circular genome = true - Initial minimum gene length = 90 bp - Determine optimal min gene length to maximize number of genes - Maximum overlap bases = 30 - Start codons = atg,gtg,ttg - Stop codons = taa,tag,tga - Sequence length = 40222 - Final minimum gene length = 97 - - Putative Genes: - 00001 40137 52 +2 0.892 - 00002 1319 1095 -3 0.654 - 00003 1555 1391 -2 0.793 - 00004 1953 2066 +3 1.078 - 00005 2045 2146 +2 0.919 - 00006 4463 4759 +2 0.985 - 00007 6785 6582 -3 1.033 - 00008 6862 7020 +1 0.915 - 00009 7300 7488 +1 0.900 - 00010 7463 7570 +2 0.912 - 00011 8399 8527 +2 1.044 - 00012 10652 10545 -3 0.895 - 00013 12170 12066 -3 1.108 - 00014 13891 13748 -2 0.998 - 00015 14157 14044 -1 1.026 - 00016 15285 15410 +3 0.928 - 00017 15829 15704 -2 0.949 - - .... - -------- - -**References** - -A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). - - - </help> -</tool>
--- a/additional/glimmer_acgt_content.xml Sun Jun 09 07:57:22 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,55 +0,0 @@ -<tool id="glimmer_acgt-content" name="ACGT Content" version="0.1"> - <description>of windows in each sequence</description> - <requirements> - <requirement type="package" version="3.02b">glimmer</requirement> - </requirements> - <command> - window-acgt - $percentage - $input_win_len - $input_win_skip - < $infile > $output - - ##TODO prettify the output - </command> - <inputs> - <param name="infile" type="data" format="fasta" label="Genome Sequence"/> - <param name="input_win_len" type="integer" value="10" label="The width of the sliding window"/> - <param name="input_win_skip" type="integer" value="10" label="The number of positions between windows to report"/> - <param name="percentage" type="boolean" truevalue="-p" falsevalue="" checked="true" label="Report percentages instead of counts"/> - </inputs> - <outputs> - <data name="output" format="tabular"/> - </outputs> - <tests> - <test> - <param name="infile" value="streptomyces_coelicolor.dna" /> - <output name="output" file="fasta_tool_convert_from_dna.out" /> - </test> - </tests> - <help> - -**What it does** - -This tool calculates the ACGT-Content from a given Sequence, given a sliding window. - -------- - -**Output** - -Output is in the format: - - window-start window-len A's C's G's T's #other %GC - -Note the last window in the sequence can be shorter than *window-len* if the sequence ends prematurely - - - - -**References** - -A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). - - - </help> -</tool>