Mercurial > repos > rnateam > blockclust_workflow
changeset 0:ba161910b46f draft
Uploaded
author | rnateam |
---|---|
date | Mon, 21 Oct 2013 12:27:17 -0400 |
parents | |
children | d0ae001db6d7 |
files | readme.rst repository_dependencies.xml |
diffstat | 2 files changed, 116 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/readme.rst Mon Oct 21 12:27:17 2013 -0400 @@ -0,0 +1,106 @@ +This package is a Galaxy workflow for BlockClust pipeline. + +It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of +genes to generate gene predictions on a new genome, and then calls EMBOSS +(Rice et al. 2000) to translate the predictions into a FASTA file of +predicted protein sequences. The workflow requires two input files: + +* Nucleotide FASTA file of know gene sequences (training set) +* Nucleotide FASTA file of genome sequence or assembled contigs + +First an interpolated context model (ICM) is built from the set of known +genes, preferably from the closest relative organism(s) available. Next this +ICM model is used to predict genes on the genomic FASTA file. This produces +a FASTA file of the predicted gene nucleotide sequences, which is translated +into protein sequences using the EMBOSS tool transeq. + +Glimmer is intended for finding genes in microbial DNA, especially bacteria, +archaea, and viruses. + +See http://www.galaxyproject.org for information about the Galaxy Project. + + +Sample Data +=========== + +As an example, we will use the first public assembly of the 2011 Shiga-toxin +producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the +open-source crowd-sourcing analysis described in Rohde et al. (2011) and here: +https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki + +You can upload this assembly directly into Galaxy using the "Upload File" tool +with either of these URLs - Galaxy should recognise this is a FASTA file with +3,057 sequences: + +* http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt +* https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt + +This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled +by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the +MIRA 3.2 assembler. It was initially released via his blog, +http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/ + +We will also need a training set of known *E. coli* genes, for example the +model strain *Escherichia coli* str. K-12 substr. MG1655 which is well +annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the +gene nucleotide sequences directly into Galaxy via this URL, which Galaxy +should recognise as a FASTA file with 4,321 sequences: + +* ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn + +Then run the workflow, which should produce 2,333 predicted genes for the +TY2482 assembly (two FASTA files, nucleotide and protein sequences). + + +Citation +======== + +If you use this workflow directly, or a derivative of it, or the associated +wrappers for Galaxy, in work leading to a scientific publication, +please cite: + +P. Videm at al... + +For Glimmer3 please cite: + +Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007) +Identifying bacterial genes and endosymbiont DNA with Glimmer. +Bioinformatics 23(6), 673-679. +http://dx.doi.org/10.1093/bioinformatics/btm009 + +For EMBOSS please cite: + +Rice, P., Longden, I. and Bleasby, A. (2000) +EMBOSS: The European Molecular Biology Open Software Suite +Trends in Genetics 16(6), 276-277. +http://dx.doi.org/10.1016/S0168-9525(00)02024-2 + + +Additional References +===================== + +Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011) +Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4. +New England Journal of Medicine 365, 718-724. +http://dx.doi.org/10.1056/NEJMoa1107643 + + +Availability +============ + +This workflow is available on the main Galaxy Tool Shed: + +http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow + +Development is being done on github: + +https://github.com/bgruening/galaxytools/workflows/glimmer3/ + + +Dependencies +============ + +These dependencies should be resolved automatically via the Galaxy Tool Shed: + +* http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3 +* http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/repository_dependencies.xml Mon Oct 21 12:27:17 2013 -0400 @@ -0,0 +1,10 @@ +<?xml version="1.0"?> +<repositories description="This workflow requires a number of different repositories."> + <repository changeset_revision="0deb03bc35df" name="blockclust" owner="rnateam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="94926c35b6f3" name="segemehl" owner="rnateam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="2f9d4b518b03" name="blockbuster" owner="rnateam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <!--<repository name="graphclust" owner="rnateam" />--> + <repository changeset_revision="5c6344f67ad0" name="infernal" owner="bgruening" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="e86bc4b0ddb5" name="package_eden_1_1" owner="bgruening" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="7003196be1b1" name="package_mcl_12_135" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu" /> +</repositories>