Mercurial > repos > bgruening > glimmer
changeset 6:a07c49839f31
Uploaded
author | bgruening |
---|---|
date | Sun, 09 Jun 2013 07:57:22 -0400 |
parents | 8ddf54417ade |
children | d2eee6e51790 |
files | additional/gbk2orf.xml additional/gbk_to_orf.py additional/glimmer2gff.py additional/glimmer2gff.xml additional/glimmer3-extract-wrapper.xml additional/glimmer3-long-orfs-wrapper.xml additional/glimmer_acgt_content.xml glimmer_build-icm.xml glimmer_w_icm.xml glimmer_wo_icm.xml readme.rst tool_dependencies.xml |
diffstat | 12 files changed, 821 insertions(+), 135 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/gbk2orf.xml Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,212 @@ +<tool id="gbkToORF" name="Extract ORF" version="0.1"> + <description>from a GenBank file</description> + <command interpreter="python"> + gbk_to_orf.py + -g $infile + -a $aa_output + -n $nc_output + ##TODO translation table, can be extracted from genbank file directly + </command> + <inputs> + <param name="infile" type='data' format="genbank" label="gene bank file"/> + </inputs> + <outputs> + <data name="aa_output" format="fasta" /> + <data name="nc_output" format="fasta" /> + </outputs> + <tests> + <test> + </test> + </tests> + <help> + + +**What it does** +Read a GenBank file and export fasta formatted amino acid and CDS files. + + +----- + +**Example** + * input:: + + Genebankfile + + LOCUS BA000030 9025608 bp DNA linear BCT 21-DEC-2007 + DEFINITION Streptomyces avermitilis MA-4680 DNA, complete genome. + ACCESSION BA000030 AP005021-AP005050 + VERSION BA000030.3 GI:148878541 + DBLINK Project: 189 + KEYWORDS . + SOURCE Streptomyces avermitilis MA-4680 + ORGANISM Streptomyces avermitilis MA-4680 + Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; + Streptomycineae; Streptomycetaceae; Streptomyces. + REFERENCE 1 + AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C., + Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T., + Kikuchi,H., Shiba,T., Sakaki,Y. and Hattori,M. + TITLE Genome sequence of an industrial microorganism Streptomyces + avermitilis: deducing the ability of producing secondary + metabolites + JOURNAL Proc. Natl. Acad. Sci. U.S.A. 98 (21), 12215-12220 (2001) + PUBMED 11572948 + REFERENCE 2 + AUTHORS Ikeda,H., Ishikawa,J., Hanamoto,A., Shinose,M., Kikuchi,H., + Shiba,T., Sakaki,Y., Hattori,M. and Omura,S. + TITLE Complete genome sequence and comparative analysis of the industrial + microorganism Streptomyces avermitilis + JOURNAL Nat. Biotechnol. 21 (5), 526-531 (2003) + PUBMED 12692562 + REFERENCE 3 (bases 1 to 9025608) + AUTHORS Omura,S., Ikeda,H., Ishikawa,J., Hanamoto,A., Takahashi,C., + Shinose,M., Takahashi,Y., Horikawa,H., Nakazawa,H., Osonoe,T., + Kushida,N., Shiba,T., Sakaki,Y. and Hattori,M. + TITLE Direct Submission + JOURNAL Submitted (29-MAR-2002) Contact:S Omura Kitasato University, + Kitasato Institute for Life Sciences; 1-15-1 Kitasato, Sagamihara, + Kanagawa 228-8555, Japan URL + :http://avermitilis.ls.kitasato-u.ac.jp/ + COMMENT On Jun 15, 2007 this sequence version replaced gi:57546753. + This work was done in collaboration with Haruo Ikeda(*1), Jun + Ishikawa(*2), Akiharu Hanamoto(*3), Chigusa Takahashi(*3), Mayumi + Shinose(*3), Hiroshi Horikawa(*4), Hidekazu Nakazawa(*4), Tomomi + Osonoe(*4), Norihiro Kushida(*4), Hisashi Kikuchi(*4), Tadayoshi + Shiba(*5), Yoshiyuki Sakaki(*6,*7), Masahira Hattori(*1,*7) + and Satoshi Omura(*1,*3). + Final finishing process and all annotation were done by H. Ikeda + and J. Ishikawa. + *1 Kitasato Institute for Life Sciences, Kitasato University *2 + National Institute of Infectious Diseases + *3 The Kitasato Institute + *4 National Institute of Technology and Evaluation *5 School of + Science, Kitasato University + *6 Institute of Medical Science, University of Tokyo *7 RIKEN, + Genomic Sciences Center + All the annotated genes identified are available from following + urls. + http://avermitilis.ls.kitasato-u.ac.jp. + FEATURES Location/Qualifiers + source 1..9025608 + /organism="Streptomyces avermitilis MA-4680" + /mol_type="genomic DNA" + /strain="MA-4680" + /db_xref="taxon:227882" + /note="This strain is also named as strain: ATCC 31267, + NCIMB 12804 or NRRL 8165." + gene complement(1380..1811) + /locus_tag="SAV_1" + CDS complement(1380..1811) + /locus_tag="SAV_1" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="BAC67710.1" + /db_xref="GI:29603637" + /translation="MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQA + AAAAEDAALNYMPGVLARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARL + MHTQEEKEAPPKSFKEKLRSALDGPQPPEPAGRPWKPGSET" + + +* output:: + + - aminoAcidOutput + >SAV_1 + MTAEWYVLVEEDTRETKRADGVELRLHRWKLAATQHIAGDQEQAAAAAEDAALNYMPGVL + ARHARPGDEPARHAFLTQDGAWLVLLRQRHRECHIRVTTARLMHTQEEKEAPPKSFKEKL + RSALDGPQPPEPAGRPWKPGSET + >SAV_2 + VPPQGARGTIVSATGSGKTSMAAASTLNCFPEGRILVTVPTLDLLAQTAQAWRAVGHHSP + MIAVCSLENDPVLNERT + >SAV_3 + MDWNFPDDDIFFCGGCGDDDTPDPRVPRQDKALCVRCDRVERQVRRYRITVPRRNAIMRF + QRDVCALCQEGPPTDHCPDAVSFWHIDHDHRCCPPGGSCGRCVRGLLCLPCNATRLPAYE + RLPNVLRDSPRFNTYLNSPPARHPEARPTARDHAGPRDASSYLIDAFFTAADHPEGNALS + S + >SAV_4 + VALTPGGTRVTQWQDRQAIGDMHERRVAAALRARGWTVQPCGQGTYPPAVREALRRTRSA + LRHFPDLIAARGADLITIDAKDRMPSTDTDRYAVSADTVTAGLFFTAAHAPTPLYYVFGD + LKVLTPAEVVHYTAHALRHRSGAFHLVRTEQAHCFDDVFGSAGAAAAA + >SAV_5 + MMLLMAAYVDPRFRPTLWPGTPVPTPELMPLRGARADGEWIVWTPQVRSRSHTVPVPEDF + YLREFMEVDPEDLDAVAALMGAYGHLGGSINTGSWDVDVYERLKELTEREHPRAPFALHG + ELATLFMREAQAAITTWLALRREGGLDALIEPEVSEEELAQWQASNADLEEAWPRDLDHL + RELSLEIRISNLVSELNAALKPFSIGIGGLGDRYPTILAVAFLQLYNHLAEDATIRECAN + ETCRRHFVRQRGRAAYGQNRTSGIKYCTRECARAQAQREHRRRRKQQTTTLQQPPAPGPQ + SHDTSEPTAEGR + >SAV_6 + MISLREHQVEANARIRAWAGFPTRSPVPAQGLRGTVVSATGSGKTITAAWAARECFRGGR + ILVMVPTLDLLVQTAQAWRRVGHNGPMVAACSLEKDEVLEQLGVRTTTNPIQLALWAGHG + PVVVFATYASLVDREDPEDVTGRAKVRGPLEAALAGGQRLYGQTMDGFDLAVVDEAHSTT + GDLGRPWAAIHDNSRIPADFRLYLTATPRILASPRPQKGADGRELEIATMASDPDGPYGE + WLFELGLSEAVERGILAGFEIDVLEIRDPSPALGESEEAQRGRRLALLQTALLEHAAARN + LRTVMTFHQRVEEAAAFAQTMPQTAARLYEAEVSAEALVDAGALPESSIGAEFYELEAGR + HVPPDRVWAAWLCGDHLVAERREVLRQFADGLDAGNKRVHRAFLASVRVLGEGVDIVGER + GVEAICFADTRGSQVEIVQNIGRALRPNPDGTNKTARIIVPVFLQPGENPTDMVASASFA + PLVTVLQGLRSHSERLVEQLASRALTSGQRHVHVKRDEDGRIIGTTTEGEGGQHESEGAV + ESALLHFSTPRDATTIAAFLRTRVYRPESLVWLEGYQALLRWRKKNHITGLYAVPYDTET + EAGVTKAFPLGRWVHQQRRTYRAGELDPHRTTLLDEAGMVWEPGDEAWENKLAALRSFHR + AHGHLAPRRDAVWGDADSELVPVGEHMANLRRKDGLGKNPQRAATRATQLAAIDPDWNCP + WPLDWQRHYRVLADLATDEPHSRLPDIQPGVQFEGDDLGKWLQRQRRSWAELSEEQQQRL + TALGVTPAEPPTPTPSAKGGGKAAAFQRGLAALAQWIQREGAHKVVPRGHVEAVVIDGQE + HQHKLGVWISNTKTRRDKLTHDQRTALAALGVEWA + .... + + - orfs + + >SAV_1 + ATGACCGCCGAGTGGTACGTCCTCGTCGAAGAGGACACACGAGAGACCAAGCGCGCCGAC + GGCGTTGAACTCAGATTGCACCGCTGGAAACTGGCGGCCACTCAGCACATCGCAGGAGAT + CAGGAACAGGCCGCCGCCGCGGCCGAGGATGCGGCCCTGAACTACATGCCGGGAGTGCTC + GCTCGGCATGCCCGACCGGGAGACGAACCGGCCCGGCATGCTTTCCTCACCCAGGACGGG + GCCTGGCTGGTGCTCCTCAGGCAGCGGCACCGCGAGTGTCACATACGGGTGACCACTGCC + CGGCTCATGCATACACAGGAAGAGAAGGAGGCCCCGCCGAAAAGCTTCAAGGAGAAACTC + CGCAGCGCCCTGGATGGTCCTCAGCCGCCCGAACCGGCTGGTAGGCCATGGAAGCCGGGC + AGCGAAACCTGA + >SAV_2 + GTGCCCCCTCAGGGAGCCCGTGGCACGATCGTGTCAGCTACCGGGTCCGGCAAAACGAGC + ATGGCCGCCGCGAGCACGCTGAACTGCTTCCCCGAAGGCCGGATCCTCGTGACCGTGCCG + ACCCTGGACCTGCTCGCACAGACCGCCCAGGCGTGGCGGGCAGTCGGCCACCACTCCCCC + ATGATCGCGGTGTGCTCGCTGGAGAACGACCCAGTGCTGAACGAGCGGACCTGA + >SAV_3 + ATGGACTGGAACTTCCCCGACGACGACATCTTCTTCTGCGGCGGGTGCGGCGACGACGAC + ACCCCCGACCCGCGGGTCCCGCGTCAGGACAAGGCCCTGTGCGTCCGCTGCGACAGAGTC + GAACGGCAGGTCCGCCGATACCGGATCACCGTGCCGCGGAGGAACGCGATCATGCGCTTC + CAGCGCGACGTCTGCGCCCTGTGCCAGGAAGGCCCGCCGACCGACCACTGCCCCGATGCC + GTCAGCTTCTGGCACATCGACCACGACCACCGCTGCTGCCCTCCCGGCGGCTCATGCGGG + CGGTGCGTCCGCGGCCTCCTGTGCCTGCCCTGCAACGCCACCCGCCTGCCCGCCTACGAA + CGCCTCCCCAACGTCCTCCGCGACAGCCCTCGCTTCAACACCTACCTCAACAGCCCACCC + GCCCGGCACCCCGAAGCCCGCCCCACCGCCAGGGACCATGCAGGCCCCCGCGACGCATCC + AGCTACCTCATCGACGCCTTTTTCACCGCCGCGGACCATCCCGAGGGGAACGCCCTCAGC + TCCTGA + >SAV_4 + GTGGCACTTACCCCAGGGGGAACCCGAGTGACGCAGTGGCAGGACCGCCAGGCGATAGGC + GACATGCACGAACGTCGGGTGGCGGCCGCGCTGCGCGCCCGCGGCTGGACCGTCCAGCCC + TGCGGACAGGGCACCTACCCGCCCGCCGTACGGGAAGCCCTGCGCCGGACCCGCTCCGCC + CTGCGGCACTTCCCCGACCTCATCGCCGCCCGCGGCGCCGACCTGATCACCATCGACGCC + AAGGACCGCATGCCCAGCACCGACACCGACCGCTACGCCGTCAGCGCCGACACCGTGACC + GCCGGCCTCTTTTTCACCGCGGCCCACGCTCCGACTCCGCTGTACTACGTCTTCGGCGAC + CTGAAGGTCCTCACGCCGGCGGAGGTGGTCCACTACACCGCTCACGCCTTGCGCCACCGC + AGCGGTGCCTTCCACCTCGTACGCACGGAGCAAGCACACTGCTTCGACGACGTCTTCGGA + TCGGCTGGCGCAGCAGCTGCGGCATGA + >SAV_5 + ATGATGCTCCTCATGGCGGCATACGTTGACCCACGCTTTCGTCCTACGCTATGGCCTGGA + ACGCCCGTGCCGACACCGGAGTTGATGCCTCTTCGCGGAGCGCGGGCCGACGGTGAATGG + ATCGTCTGGACCCCGCAGGTCCGCTCCCGCTCGCACACGGTCCCCGTGCCGGAGGACTTC + TACCTGCGCGAGTTCATGGAGGTCGACCCTGAGGACCTCGACGCCGTGGCCGCCCTGATG + GGCGCCTACGGACACCTCGGCGGGAGCATCAACACCGGAAGCTGGGACGTCGACGTCTAC + GAGCGCCTCAAGGAGCTCACGGAGCGCGAACACCCCCGCGCGCCGTTCGCCCTGCACGGC + GAACTGGCCACGCTGTTCATGAGGGAGGCGCAGGCGGCCATCACCACCTGGCTGGCCCTG + CGCCGCGAGGGCGGGCTCGACGCGCTCATCGAGCCCGAGGTGTCCGAGGAAGAACTGGCG + CAGTGGCAAGCGAGCAACGCTGATCTTGAGGAAGCGTGGCCGCGGGACCTGGACCACCTG + CGCGAACTCTCCCTGGAGATCAGGATCAGCAACCTCGTGAGCGAACTGAACGCCGCGCTG + AAGCCGTTCAGCATCGGCATCGGCGGCCTGGGCGACCGCTACCCCACCATCCTCGCTGTG + GCGTTCCTCCAGCTCTACAACCACCTCGCCGAGGACGCCACGATCCGCGAGTGCGCGAAC + GAGACCTGCCGCCGCCACTTCGTACGCCAGCGCGGCCGCGCCGCATACGGGCAGAACCGC + ACCAGCGGCATCAAGTACTGCACCCGCGAATGCGCCCGCGCCCAGGCCCAGCGCGAACAC + CGCCGGCGCCGCAAACAGCAGACCACGACCCTCCAGCAGCCGCCGGCGCCTGGTCCTCAG + TCTCACGACACCTCAGAGCCGACTGCCGAAGGGCGCTGA + ....... + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/gbk_to_orf.py Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,61 @@ +#!/usr/bin/env python + +################################################################### +## +## gbk2orf.py by Errol Strain (estrain@gmail.com) +## +## Read a GenBank file and export fasta formatted amino acid and +## CDS files +## +################################################################### + +import sys +from optparse import OptionParser +from Bio import SeqIO +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord + + +## Command line usage +usage = "usage: %prog -g input.gbk -a aa.fasta -n nuc.fasta" +p = OptionParser(usage) +p.add_option("-t","--translate", dest="transtabl",type="int",default=11, + help="Translation table used to translate coding regions (default=11)") +p.add_option("-g","--genbank", dest="gb_file",help="GenBank input file") +p.add_option("-a","--amino_acid", dest="aa_file",help="Fasta amino acid output") +p.add_option("-n","--nucleotide", dest="orf_file",help="Fasta nucleotide output") +(opts, args) = p.parse_args() +## Do I need this next line? +if not opts and not args : p.error("Use --help to see usage") +if len(sys.argv)==1 : p.error("Use --help to see usage") + +## Lists to hold SeqRecords +aalist = [] +nuclist = [] + +## If the CDS does not have a locus tag the name will be assigned using the +## order in which it was found +feat_count=0 + +## Iterate through genbank records in input file +for gb_record in SeqIO.parse(open(opts.gb_file,"r"), "genbank") : + for (index, feature) in enumerate(gb_record.features) : + if feature.type=="CDS" : + feat_count = feat_count + 1 + gene = feature.extract(gb_record.seq) + if "locus_tag" in feature.qualifiers : + value = feature.qualifiers["locus_tag"][0] + else : + value = "Index_" + str(feat_count) + nuclist.append(SeqRecord(Seq(str(gene)),id=value,name=value)) + pro=Seq(str(gene.translate(table=opts.transtabl,to_stop=True))) + aalist.append(SeqRecord(pro,id=value,name=value)) + +## Write out lists in fasta format +aa_handle = open(opts.aa_file,"w") +SeqIO.write(aalist,aa_handle,"fasta") +aa_handle.close() +orf_handle = open(opts.orf_file,"w") +SeqIO.write(nuclist,orf_handle,"fasta") +orf_handle.close() +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/glimmer2gff.py Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,36 @@ +#!/usr/bin/env python + +""" +Input: Glimmer3 prediction +Output: GFF3 file +Return a GFF3 file with the genes predicted by Glimmer3 +Bjoern Gruening + +Note: Its not a full-fledged GFF3 file, its a really simple one. + +""" + +import sys, re + +def __main__(): + input_file = open(sys.argv[1], 'r') + + print '##gff-version 3\n' + for line in input_file: + line = line.strip() + if line[0] == '>': + header = line[1:] + else: + (id, start, end, frame, score) = re.split('\s+', line) + if int(end) > int(start): + strand = '+' + else: + strand = '-' + (start, end) = (end, start) + + rest = 'frame=%s;score=%s' % (frame, score) + print '\t'.join([header, 'glimmer_prediction', 'predicted_gene', start, end, '.', strand, '.', rest]) + + +if __name__ == "__main__" : + __main__()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/glimmer2gff.xml Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,63 @@ +<tool id="glimmer2gff" name="Convert Glimmer to GFF" version="0.1"> + <description>Converts Glimmer Files to GFF Files</description> + <command interpreter="python"> + glimmer2gff.py + $input > $output + </command> + <inputs> + <param name="input" type="data" format="tabular" label="Glimmer Output File"/> + </inputs> + <outputs> + <data name="output" type="data" format="gff"/> + </outputs> + <tests> + <test> + + </test> + </tests> + <help> + +**What it does** + +Converts a Glimmer3 output File to an GFF Annotation File:: + +**Example** + +Input:: + >contig00097 sbe.0.234 + orf00003 2869 497 -2 5.60 + orf00005 3894 2875 -1 7.05 + orf00007 4242 4826 +3 8.04 + orf00010 4846 5403 +1 8.57 + orf00012 6858 5413 -1 10.87 + orf00013 6857 7594 +2 3.61 + orf00014 7751 9232 +2 11.34 + orf00015 9374 10357 +2 10.66 + orf00017 10603 11196 +1 13.39 + orf00021 11303 11911 +2 8.81 + orf00025 14791 12050 -2 13.51 + orf00026 15216 16199 +3 6.37 + orf00028 16333 16935 +1 8.86 + + +Output: + contig00097 sbe.0.234 glimmer gene 497 2869 . - . -2 5.60 + contig00097 sbe.0.234 glimmer gene 2875 3894 . - . -1 7.05 + contig00097 sbe.0.234 glimmer gene 4242 4826 . + . +3 8.04 + contig00097 sbe.0.234 glimmer gene 4846 5403 . + . +1 8.57 + contig00097 sbe.0.234 glimmer gene 5413 6858 . - . -1 10.87 + contig00097 sbe.0.234 glimmer gene 6857 7594 . + . +2 3.61 + contig00097 sbe.0.234 glimmer gene 7751 9232 . + . +2 11.34 + contig00097 sbe.0.234 glimmer gene 9374 10357 . + . +2 10.66 + contig00097 sbe.0.234 glimmer gene 10603 11196 . + . +1 13.39 + contig00097 sbe.0.234 glimmer gene 11303 11911 . + . +2 8.81 + contig00097 sbe.0.234 glimmer gene 12050 14791 . - . -2 13.51 + contig00097 sbe.0.234 glimmer gene 15216 16199 . + . +3 6.37 + contig00097 sbe.0.234 glimmer gene 16333 16935 . + . +1 8.86 + + +----- + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/glimmer3-extract-wrapper.xml Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,127 @@ +<tool id="glimmer_extract" name="glimmer3-extract" version="0.1"> + <description></description> + <requirements> + <requirement type="package" version="3.02b">glimmer</requirement> + </requirements> + <command> + extract + -t + $seqInput + $cordInput > $output + 2> /dev/null + </command> + <inputs> + <param name="seqInput" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/> + <param name="cordInput" type="data" label="Coordinates" help="Dataset missing? See TIP below"/> + </inputs> + <outputs> + <data format="fasta" name="output" /> + </outputs> + <tests> + <test> + <param name="seqInput" value='glimmer3/seqTest.fa'/> + <param name="cordInput" value='glimmer3/cordTest.txt'/> + <output name="output" file='glimmer3/extractTestOutput.dat'/> + </test> + </tests> + + <help> + +**What it does** + + This program reads a genome sequence and a list of coordinates for it and outputs a multi- + fasta file of the regions specified by the coordinates. + +----- + +**Glimmer Overview** + +:: + +************** ************** ************** ************** +* * * * * * * * +* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * +* * * * * * * * +************** ************** ************** ************** + +----- + +**Example** + + +* input :: + + -Genome Sequence + + CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 + GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT + GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT + TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT + TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC + GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA + ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG + AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA + CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA + TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC + AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA + GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC + AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC + CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA + AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC + GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT + ... + + - Coorinates + + 00001 40137 52 +2 0.892 + 00002 1319 1095 -3 0.654 + 00003 1555 1391 -2 0.793 + 00004 1953 2066 +3 1.078 + 00005 2045 2146 +2 0.919 + 00006 4463 4759 +2 0.985 + 00007 6785 6582 -3 1.033 + 00008 6862 7020 +1 0.915 + 00009 7300 7488 +1 0.900 + 00010 7463 7570 +2 0.912 + 00011 8399 8527 +2 1.044 + 00012 10652 10545 -3 0.895 + 00013 12170 12066 -3 1.108 + 00014 13891 13748 -2 0.998 + 00015 14157 14044 -1 1.026 + 00016 15285 15410 +3 0.928 + 00017 15829 15704 -2 0.949 + ... + +* output:: + + >00001 40137 52 len=135 + ATGACACATTTGCTCGTTGCTTTGACCCACTACGAGGCCAGTATCATGATTTCTAGAAAA + ACCCTCTTTTTGACTTCTTCCTCCATGATCCTTGTAGATTTTGAATTTGAAGTTTTTTCT + CATTCCAAAACTCTG + + >00002 1319 1095 len=222 + TTGGCTCGCCGTTTTGGAGTCCGTGCTGGAATGCCTGGCTTCATCTCAAATAAACTTTGT + CCGAGTCTAACGATTGTTCCAGGAAATTACCCTAAATACACTAAAGTCAGTCGCCAATTT + TCACAAATTTTCATGGAATACGATTCGGATGTTGGAATGATGTCATTGGATGAGGCATTT + ATAGATTTGACAGACTATGTGGCAAGTAATACAGAAAAAAGT + + >00003 1555 1391 len=162 + ATGGAGAATCTTGAGATGAAACTGGAATCATCTAGAGATTTATCAAGAGACTGTGTTTGT + ATAGATATGGATGCTTATTTTGCCGCAGTTGAAATGAGAGATAATCCTGCACTGAGAACA + GTTCCTATGGCCGTAGGCTCATCGGCAATGCTGGTAAGCACC + + >00004 1953 2066 len=111 + GTGCGCGAGAAAAAACTACGCGTTAACCGCCAATTTTCACTTCCCCACAGATCTGTCTCG + AGATTCTCGAGTCATTTTTCAAGTTTATTTGTTTGTCAGCGGTTGTTTTAT + ..... + +------- + +**References** + +A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/glimmer3-long-orfs-wrapper.xml Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,125 @@ +<tool id="glimmer_long-orfs" name="long ORFs" version="0.1"> + <description>identify long, non-overlapping ORFs (glimmer)</description> + <requirements> + <requirement type="package" version="3.02b">glimmer</requirement> + </requirements> + <command> + long-orfs + -n -t + $cutoff + $inputfile + $output + 2>&1 + </command> + <inputs> + <param name="inputfile" type="data" format="fasta" label="Genome Sequence" help="Dataset missing? See TIP below"/> + <param name='cutoff' type='float' label='cutoff' value='1.5'/> + </inputs> + <outputs> + <data format="tabular" name="output" /> + </outputs> + <tests> + <test> + <param name="inputfile" value='glimmer3/seqTest.fa'/> + <param name='cutoff' value='1.5'/> + <output name="output" file='glimmer3/longORFSTestOutput.dat'/> + </test> + </tests> + <help> + +**What it does** + + This program identifies long, non-overlapping open reading frames (orfs) in a DNA sequence file. + These orfs are very likely to contain genes, and can be used as a set of training sequences + More specifically, among all orfs longer than a minimum length , those that do not overlap any others are output. The start codon used for + each orf is the first possible one. The program, by default, automatically determines the + value that maximizes the number of orfs that are output. With the -t option, the initial + set of candidate orfs also can be filtered using entropy distance, which generally produces + a larger, more accurate training set, particularly for high-GC-content genomes. + + + +----- + +**Glimmer Overview** + +:: + +************** ************** ************** ************** +* * * * * * * * +* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * +* * * * * * * * +************** ************** ************** ************** + +----- + +**Example** + + +* input:: + + -Genome Sequence + + CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 + GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT + GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT + TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT + TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC + GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA + ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG + AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA + CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA + TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC + AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA + GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC + AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC + CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA + AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC + GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT + ..... + + - Cutoff 1.5 + +* output:: + + Sequence file = /home/mohammed/galaxy-central/database/files/000/dataset_34.dat + Excluded regions file = none + Circular genome = true + Initial minimum gene length = 90 bp + Determine optimal min gene length to maximize number of genes + Maximum overlap bases = 30 + Start codons = atg,gtg,ttg + Stop codons = taa,tag,tga + Sequence length = 40222 + Final minimum gene length = 97 + + Putative Genes: + 00001 40137 52 +2 0.892 + 00002 1319 1095 -3 0.654 + 00003 1555 1391 -2 0.793 + 00004 1953 2066 +3 1.078 + 00005 2045 2146 +2 0.919 + 00006 4463 4759 +2 0.985 + 00007 6785 6582 -3 1.033 + 00008 6862 7020 +1 0.915 + 00009 7300 7488 +1 0.900 + 00010 7463 7570 +2 0.912 + 00011 8399 8527 +2 1.044 + 00012 10652 10545 -3 0.895 + 00013 12170 12066 -3 1.108 + 00014 13891 13748 -2 0.998 + 00015 14157 14044 -1 1.026 + 00016 15285 15410 +3 0.928 + 00017 15829 15704 -2 0.949 + + .... + +------- + +**References** + +A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/additional/glimmer_acgt_content.xml Sun Jun 09 07:57:22 2013 -0400 @@ -0,0 +1,55 @@ +<tool id="glimmer_acgt-content" name="ACGT Content" version="0.1"> + <description>of windows in each sequence</description> + <requirements> + <requirement type="package" version="3.02b">glimmer</requirement> + </requirements> + <command> + window-acgt + $percentage + $input_win_len + $input_win_skip + < $infile > $output + + ##TODO prettify the output + </command> + <inputs> + <param name="infile" type="data" format="fasta" label="Genome Sequence"/> + <param name="input_win_len" type="integer" value="10" label="The width of the sliding window"/> + <param name="input_win_skip" type="integer" value="10" label="The number of positions between windows to report"/> + <param name="percentage" type="boolean" truevalue="-p" falsevalue="" checked="true" label="Report percentages instead of counts"/> + </inputs> + <outputs> + <data name="output" format="tabular"/> + </outputs> + <tests> + <test> + <param name="infile" value="streptomyces_coelicolor.dna" /> + <output name="output" file="fasta_tool_convert_from_dna.out" /> + </test> + </tests> + <help> + +**What it does** + +This tool calculates the ACGT-Content from a given Sequence, given a sliding window. + +------- + +**Output** + +Output is in the format: + + window-start window-len A's C's G's T's #other %GC + +Note the last window in the sequence can be shorter than *window-len* if the sequence ends prematurely + + + + +**References** + +A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). + + + </help> +</tool>
--- a/glimmer_build-icm.xml Fri Jun 07 10:02:12 2013 -0400 +++ b/glimmer_build-icm.xml Sun Jun 09 07:57:22 2013 -0400 @@ -1,5 +1,5 @@ -<tool id="glimmer_build-icm" name="ICM builder" version="0.1"> - <description>(glimmer)</description> +<tool id="glimmer_build-icm" name="Glimmer ICM builder" version="0.2"> + <description></description> <requirements> <requirement type="package" version="3.02b">glimmer</requirement> </requirements> @@ -74,40 +74,41 @@ **What it does** - This program constructs an interpolated context model (ICM) from an input set of sequences. - This model can be used by Glimmer3 to predict genes. +This program constructs an interpolated context model (ICM) from an input set of sequences. + +This model can be used by Glimmer3 to predict genes. + +**TIP** To extract CDS from a GenBank file use the tool *Extract ORF from a GenBank file*. ----- - **Example** -* input:: +*Input*:: - -Genome Sequence + - Genome Sequence - >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 - GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT - GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT - TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT - TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC - GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA - ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG - AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA - CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA - TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC - AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA - GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC - AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC - CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA - AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC - GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT - ..... + >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 + GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT + GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT + TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT + TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC + GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA + ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG + AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA + CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA + TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC + AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA + GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC + AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC + CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA + AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC + GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT + ..... -* output: +*Output*:: interpolated context model (ICM) - ------- **References**
--- a/glimmer_w_icm.xml Fri Jun 07 10:02:12 2013 -0400 +++ b/glimmer_w_icm.xml Sun Jun 09 07:57:22 2013 -0400 @@ -1,4 +1,4 @@ -<tool id="glimmer_knowlegde-based" name="Glimmer3" version="0.1"> +<tool id="glimmer_knowlegde-based" name="Glimmer3" version="0.2"> <description>Predict ORFs in prokaryotic genomes (knowlegde-based)</description> <requirements> <requirement type="package" version="3.02b">glimmer</requirement> @@ -8,7 +8,8 @@ <command> #import tempfile, os #set $temp = tempfile.NamedTemporaryFile( delete=False ) - # $temp.close() + #silent $temp.close() + #set $temp = $temp.name glimmer3 --max_olap $max_olap @@ -32,16 +33,16 @@ $temp 2>&1; ## convert prediction to FASTA sequences - \$GLIMMER_SCRIPT_PATH/glimmer2seq.py $temp".predict" $seq_input $genes_output + \$GLIMMER_SCRIPT_PATH/glimmer2seq.py $temp".predict" $seq_input $genes_output; #if $report: - mv $temp".predict" $prediction; + mv $temp".predict" $report_output; #else: rm $temp".predict"; #end if #if $detailed_report: - mv $temp".detail" $detailed; + mv $temp".detail" $detailed_output; #else: rm $temp".detail"; #end if @@ -99,10 +100,10 @@ </inputs> <outputs> <data name="genes_output" format="fasta" label="Glimmer3 on ${on_string} (Gene Prediction FASTA)" /> - <data name="prediction" format="txt" label="Glimmer3 on ${on_string} (Gene Prediction table)"> + <data name="report_output" format="txt" label="Glimmer3 on ${on_string} (Gene Prediction table)"> <filter>report == True</filter> </data> - <data name="detailed" format="txt" label="Glimmer3 on ${on_string} (detailed report)"> + <data name="detailed_output" format="txt" label="Glimmer3 on ${on_string} (detailed report)"> <filter>detailed_report == True</filter> </data> </outputs> @@ -123,102 +124,85 @@ **What it does** - This is the main program that makes gene preditions based on an interpolated context model (ICM). - The ICM can be generated either with a de novo prediction (see glimmer Overview) or with extracted CDS from related organisms. +This is the main program that makes gene preditions based on an interpolated context model (ICM). ------ - -**TIP** To extract CDS from a GenBank file use the tool *Extract ORF from a GenBank file*. +The ICM can be generated with extracted CDS from related organisms (ICM builder). If you can't generate an ICM model you can use the non knowlegde-based Glimmer with a de novo prediction. ----- -**Glimmer Overview** - -:: - -************** ************** ************** ************** -* * * * * * * * -* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * -* * * * * * * * -************** ************** ************** ************** - **Example** -* input:: +*Input*:: + + - interpolated context model (ICM): Use the 'Glimmer ICM builder' tool to create one + - Genome Sequence in FASTA format - -Genome Sequence + >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 + GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT + GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT + TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT + TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC + GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA + ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG + AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA + CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA + TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC + AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA + GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC + AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC + CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA + AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC + GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT + ..... - CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 - GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT - GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT - TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT - TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC - GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA - ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG - AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA - CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA - TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC - AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA - GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC - AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC - CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA - AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC - GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT - ..... +*Output*:: - - - interpolated context model (ICM) 92: glimmer3-build-icm on data 89 - - maximum overlap length 50 - - minimum gene length. 90 - - threshold score 30 - - linear True - -* output:: + - FASTA file with predicted proteins + - Glimmer prediction file (optional) - .predict file - >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. - orf00001 40137 52 +2 8.68 - orf00004 603 34 -1 2.91 - orf00006 1289 1095 -3 3.16 - orf00007 1555 1391 -2 2.33 - orf00008 1809 1576 -1 1.02 - orf00010 1953 2066 +3 3.09 - orf00011 2182 2304 +1 0.89 - orf00013 2390 2521 +2 0.60 - orf00018 2570 3073 +2 2.54 - orf00020 3196 3747 +1 2.91 - orf00022 3758 4000 +2 0.83 - orf00023 4399 4157 -2 1.31 - orf00025 4463 4759 +2 2.92 - orf00026 4878 5111 +3 0.78 - orf00027 5468 5166 -3 1.64 - orf00029 5590 5832 +1 0.29 - orf00032 6023 6226 +2 6.02 - orf00033 6217 6336 +1 3.09 - ........ - + >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. + orf00001 40137 52 +2 8.68 + orf00004 603 34 -1 2.91 + orf00006 1289 1095 -3 3.16 + orf00007 1555 1391 -2 2.33 + orf00008 1809 1576 -1 1.02 + orf00010 1953 2066 +3 3.09 + orf00011 2182 2304 +1 0.89 + orf00013 2390 2521 +2 0.60 + orf00018 2570 3073 +2 2.54 + orf00020 3196 3747 +1 2.91 + orf00022 3758 4000 +2 0.83 + orf00023 4399 4157 -2 1.31 + orf00025 4463 4759 +2 2.92 + orf00026 4878 5111 +3 0.78 + orf00027 5468 5166 -3 1.64 + orf00029 5590 5832 +1 0.29 + orf00032 6023 6226 +2 6.02 + orf00033 6217 6336 +1 3.09 + ........ - .details file - >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. - Sequence length = 40222 + - Glimmer detailed report (optional) - ----- Start ----- --- Length ---- ------------- Scores ------------- - ID Frame of Orf of Gene Stop of Orf of Gene Raw InFrm F1 F2 F3 R1 R2 R3 NC - 0001 +2 40137 40137 52 135 135 9.26 96 - 96 - - 3 - 0 - 0002 +1 58 64 180 120 114 5.01 69 69 - - 30 - - 0 - +3 300 309 422 120 111 -0.68 20 - - 20 38 - - 41 - +3 423 432 545 120 111 1.29 21 - 51 21 13 - 8 5 - 0003 +2 401 416 595 192 177 2.51 93 - 93 - 5 - - 1 - 0004 -1 645 552 34 609 516 2.33 99 - - - 99 - - 0 - +1 562 592 762 198 168 -2.54 1 1 - - - - - 98 - +1 763 772 915 150 141 -1.34 1 1 - - - - 86 11 - +3 837 846 1007 168 159 1.35 28 - 50 28 - - 17 3 - 0005 -3 1073 977 654 417 321 0.52 84 - - - - - 84 15 - 0006 -3 1373 1319 1095 276 222 3.80 99 - - - - - 99 0 - 0007 -2 1585 1555 1391 192 162 2.70 98 - - - - 98 - 1 - 0008 -1 1812 1809 1576 234 231 1.26 94 - - - 94 - - 5 - 0009 +2 1721 1730 1945 222 213 0.68 80 - 80 - - - - 19 - ..... + >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. + Sequence length = 40222 + + ----- Start ----- --- Length ---- ------------- Scores ------------- + ID Frame of Orf of Gene Stop of Orf of Gene Raw InFrm F1 F2 F3 R1 R2 R3 NC + 0001 +2 40137 40137 52 135 135 9.26 96 - 96 - - 3 - 0 + 0002 +1 58 64 180 120 114 5.01 69 69 - - 30 - - 0 + +3 300 309 422 120 111 -0.68 20 - - 20 38 - - 41 + +3 423 432 545 120 111 1.29 21 - 51 21 13 - 8 5 + 0003 +2 401 416 595 192 177 2.51 93 - 93 - 5 - - 1 + 0004 -1 645 552 34 609 516 2.33 99 - - - 99 - - 0 + +1 562 592 762 198 168 -2.54 1 1 - - - - - 98 + +1 763 772 915 150 141 -1.34 1 1 - - - - 86 11 + +3 837 846 1007 168 159 1.35 28 - 50 28 - - 17 3 + 0005 -3 1073 977 654 417 321 0.52 84 - - - - - 84 15 + 0006 -3 1373 1319 1095 276 222 3.80 99 - - - - - 99 0 + 0007 -2 1585 1555 1391 192 162 2.70 98 - - - - 98 - 1 + 0008 -1 1812 1809 1576 234 231 1.26 94 - - - 94 - - 5 + 0009 +2 1721 1730 1945 222 213 0.68 80 - 80 - - - - 19 + ..... -------
--- a/glimmer_wo_icm.xml Fri Jun 07 10:02:12 2013 -0400 +++ b/glimmer_wo_icm.xml Sun Jun 09 07:57:22 2013 -0400 @@ -1,4 +1,4 @@ -<tool id="glimmer_not-knowlegde-based" name="Glimmer3" version="0.1"> +<tool id="glimmer_not-knowlegde-based" name="Glimmer3" version="0.2"> <description>Predict ORFs in prokaryotic genomes (not knowlegde-based)</description> <requirements> <requirement type="package" version="3.02b">glimmer</requirement> @@ -58,9 +58,21 @@ ----- +**Glimmer Overview** + +:: + +************** ************** ************** ************** +* * * * * * * * +* long-orfs * ===> * Extract * ===> * build-icm * ===> * glimmer3 * +* * * * * * * * +************** ************** ************** ************** + +----- + **Example** -Suppose you have the following DNA formatted sequences:: +Suppose you have the following DNA sequences:: >SQ Sequence 8667507 BP; 1203558 A; 3121252 C; 3129638 G; 1213059 T; 0 other; cccgcggagcgggtaccacatcgctgcgcgatgtgcgagcgaacacccgggctgcgcccg @@ -68,8 +80,9 @@ cccgcttcgcgggcttggtgacgctccgtccgctgcgcttccggagttgcggggcttcgc cccgctaaccctgggcctcgcttcgctccgccttgggcctgcggcgggtccgctgcgctc ccccgcctcaagggcccttccggctgcgcctccaggacccaaccgcttgcgcgggcctgg + ....... -Running this tool will produce this:: +Running this tool will produce a FASTA file with predicted genes and glimmer output files like the following:: >SQ Sequence 8667507 BP; 1203558 A; 3121252 C; 3129638 G; 1213059 T; 0 other; orf00001 577 699 +1 5.24
--- a/readme.rst Fri Jun 07 10:02:12 2013 -0400 +++ b/readme.rst Sun Jun 09 07:57:22 2013 -0400 @@ -1,46 +1,55 @@ -Galaxy wrapper for RepeatMasker -=============================== +======================================= +Galaxy wrapper for Glimmer gene calling +======================================= This wrapper is copyright 2012-2013 by Björn Grüning. -This is a wrapper for the command line tool of Glimmer3. -http://www.cbcb.umd.edu/software/glimmer/ +This is a wrapper for the command line tool of Glimmer3_. + +.. _Glimmer: http://www.cbcb.umd.edu/software/glimmer/ Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. +S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548. + A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER, Nucleic Acids Research 27:23 (1999), 4636-4641. -S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548. + A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007). - +============ Installation ============ -Since version 0.2 the recommended installation procedure is via the Galaxy Tool Shed. +Since version 0.2 the recommended installation procedure is via the `Galaxy Tool Shed`. + +.. _`Galaxy Tool Shed`: http://toolshed.g2.bx.psu.edu/view/bjoern-gruening/glimmer3 -To install Glimmer3 manually, please download Glimmer3 from http://www.cbcb.umd.edu/software/glimmer/glimmer302.tar.gz +To install Glimmer3 manually, please download Glimmer3 from:: + + http://www.cbcb.umd.edu/software/glimmer/glimmer302.tar.gz + and follow the installation instructions. You can also use packages from your distribution like http://packages.debian.org/stable/science/tigr-glimmer To install the wrapper copy the glimmer3 folder in the galaxy tools folder and modify the tools_conf.xml file to make the tool available to Galaxy. -For example: +For example:: -<tool file="gene_prediction/tools/glimmer3/glimmer_w_icm.xml" /> -<tool file="gene_prediction/tools/glimmer3/glimmer_wo_icm.xml" /> -<tool file="gene_prediction/tools/glimmer3/glimmer_build-icm.xml" /> + <tool file="gene_prediction/tools/glimmer3/glimmer_w_icm.xml" /> + <tool file="gene_prediction/tools/glimmer3/glimmer_wo_icm.xml" /> + <tool file="gene_prediction/tools/glimmer3/glimmer_build-icm.xml" /> - +======= History ======= - v0.1: Initial public release - v0.2: Add tool shed integration - +=============================== Wrapper Licence (MIT/BSD style) ===============================
--- a/tool_dependencies.xml Fri Jun 07 10:02:12 2013 -0400 +++ b/tool_dependencies.xml Sun Jun 09 07:57:22 2013 -0400 @@ -1,7 +1,7 @@ <?xml version="1.0"?> <tool_dependency> <package name="biopython" version="1.61"> - <repository changeset_revision="e87f0c6897a8" name="package_biopython_1_61" owner="bgruening" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="627c7b41b970" name="package_biopython_1_61" owner="biopython" toolshed="http://testtoolshed.g2.bx.psu.edu" /> </package> <set_environment version="1.0"> <environment_variable action="set_to" name="GLIMMER_SCRIPT_PATH">$REPOSITORY_INSTALL_DIR</environment_variable>