changeset 3:d6553277b759 draft

Uploaded
author rnateam
date Tue, 21 Jan 2014 04:57:28 -0500
parents e9b2400cc569
children adcc8f1b6920
files readme.rst
diffstat 1 files changed, 234 insertions(+), 62 deletions(-) [+]
line wrap: on
line diff
--- a/readme.rst	Tue Oct 22 17:35:13 2013 -0400
+++ b/readme.rst	Tue Jan 21 04:57:28 2014 -0500
@@ -1,55 +1,238 @@
+
+
 This package is a Galaxy workflow for BlockClust pipeline.
 
-It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of
-genes to generate gene predictions on a new genome, and then calls EMBOSS
-(Rice et al. 2000) to translate the predictions into a FASTA file of
-predicted protein sequences. The workflow requires two input files:
+
+======
+Galaxy
+======
+
+`Galaxy <http://galaxyproject.org/>`_ is an open, web-based platform for data intensive research.
+All tools can be combined in workflows without any need of programming skills. 
+Furthermore the platform can be extended with more tools at any time.
+Each tool has its own information about what it does and how the input is supposed to look like.
+You can make data available for Galaxy by uploading local files or downloading online content.
+Inputfiles, workflowsteps and results are stored in a history where you can view them or reaccess them later.
+It is possible to share workflows and histories with other users or make the public available.
+Saved workflows can be used with new input files or just to rerun an analyses which ensures repeatability.
+
+
+
+Getting Started
+===============
+
+BlockClust can be installed on all common Unix systems. 
+However, it is developed on Linux and I don't have access to OS X. You are welcome to help improving this documentation, just contact_ me.
+
+For any additional information, especially cluster configuration or general Galaxy_ questions, 
+please have a look at the Galaxy Wiki.
+
+- http://wiki.galaxyproject.org/
+
+- http://wiki.galaxyproject.org/Admin/
+
+- http://galaxyproject.org/search/web/
+
+.. _contact: https://github.com/bgruening
+.. _Galaxy: http://galaxyproject.org/
+
+Prerequisites::
+
+* Python 2.6 or 2.7
+* standard C compiler, C++ and Fortran compiler
+* Autotools
+* CMake
+* cairo development files (used for PNG depictions)
+* python development files
+* Java Runtime Environment (JRE, used by OPSIN and NPLS)
+
+To install all of the prerequisites you can run the following command, depending on your OS:
+
+- Debian based systems: apt-get install build-essential gfortran cmake mercurial libcairo2-dev python-dev
+- Fedora: yum install make automake gcc gcc-c++ gcc-gfortran cmake mercurial libcairo2-devel python-devel
+- OS X (MacPorts_): port install gcc cmake automake mercurial cairo-devel
+
+.. _MacPorts: http://www.macports.org/
+
 
-* Nucleotide FASTA file of know gene sequences (training set)
-* Nucleotide FASTA file of genome sequence or assembled contigs
+===================
+Galaxy installation
+===================
+
+
+0. Create a sand-boxed Python using virtualenv_ (not necessary but recommended)::
+
+        wget https://raw.github.com/pypa/virtualenv/master/virtualenv.py
+	python ./virtualenv.py --no-site-packages galaxy_env
+	. ./galaxy_env/bin/activate
+
+.. _virtualenv: http://www.virtualenv.org/
+
+
+1. Clone the latest `Galaxy platform`_::
+
+	hg clone https://bitbucket.org/galaxy/galaxy-central/
+
+.. _Galaxy platform: http://wiki.galaxyproject.org/Admin/Get%20Galaxy
+
+2. Navigate to the galaxy-central folder and update it::
+	
+	cd ~/galaxy-central
+	hg pull
+	hg update
+   
+   This step is not necessary if you have a fresh checkout. Anyway, it is good to know ;)
+
+3. Create folders for toolshed and dependencies::
+
+	mkdir ~/shed_tools
+	mkdir ~/galaxy-central/tool_deps
+
+4. Create configuration file::
+
+	cp ~/galaxy-central/universe_wsgi.ini.sample ~/galaxy-central/universe_wsgi.ini
+
+5. Open universe_wsgi.ini and change the dependencies directory::
+
+	LINUX: gedit ~/galaxy-central/universe_wsgi.ini
+	OS X: open -a TextEdit ~/galaxy-central/universe_wsgi.ini
+
+6. Search for ``tool_dependency_dir = None`` and change it to ``tool_dependency_dir = ./tool_deps``, remove the ``#`` if needed
+
+7. Remove the ``#`` in front of ``tool_config_file`` and ``tool_path``
+
+8. (Re-)Start the galaxy daemon::
+
+	sh run.sh --reload
+	
+   In deamon mode all logs will be written to main.log in your Galaxy Home directory. You can also use::
+   
+	run.sh   
+
+   During the first startup Galaxy will prepare your database. That can take some time. Have a look at the log file if you want to know what happens.
+
+After launching galaxy is accessible via the browser at ``http://localhost:8080/``.
+
+
 
-First an interpolated context model (ICM) is built from the set of known
-genes, preferably from the closest relative organism(s) available. Next this
-ICM model is used to predict genes on the genomic FASTA file. This produces
-a FASTA file of the predicted gene nucleotide sequences, which is translated
-into protein sequences using the EMBOSS tool transeq.
+=======================
+Tool Shed configuration
+=======================
+
+- Register a new user account in your Galaxy instance: Top Panel → User → Register
+- Become an admin
+	- open ``universe_wsgi.ini`` in your favourite text editor (gedit universe_wsgi.ini)
+	- search ``admin_users = None`` and change it to ``admin_users = EMAIL_ADDRESS`` (your Galaxy Username)
+	- remove the ``#`` if needed
+- restart Galaxy
+
+::
+
+	sh run.sh --reload
+
+
+=======================
+BlockClust installation
+=======================
+
+BlockClust will automatically download and compile all requirements, 
+like EDeN, samtools and so on. It can take up to 1-2 hours.
+
+
+Installation via Galaxy API (recommended)
+=========================================
+
+- Generate an `API Key`_
+- Run the installation script::
+	
+	python ./scripts/api/install_tool_shed_repositories.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow --tool-deps --repository-deps --panel-section-name ChemicalToolBoX
+
+The -r argument specifies the version of ChemicalToolBoX. You can get the latest revsion number from the 
+`test tool shed`_ or with the following command::
+
+	hg identify http://toolshed.g2.bx.psu.edu/repos/bgruening/chemicaltoolbox
+
+You can watch the installation status under: Top Panel → Admin → Manage installed tool shed repositories
+
+
+.. _API Key: http://wiki.galaxyproject.org/Admin/API#Generate_the_Admin_Account_API_Key
+.. _`test tool shed`: http://testtoolshed.g2.bx.psu.edu/
+
+
+Installation via webbrowser
+===========================
+
+- go to the `admin page`_
+- select *Search and browse tool sheds*
+- Galaxy test tool shed > Sequence Analysis  > blockclust_workflow
+- install chemicaltoolbox
+
+.. _admin page: http://localhost:8080/admin
+
+
 
-Glimmer is intended for finding genes in microbial DNA, especially bacteria,
-archaea, and viruses.
+===============
+Troubleshooting
+===============
+
+If you have any trouble or the installation did not finish properly, do not hesitate to contact me. However, if the 
+installation fails during the Galaxy installation, you can have a look at the `Galaxy wiki`_. If the ChemicalToolBoX installation fails, 
+you can try to run::
+
+	python ./scripts/api/repair_tool_shed_repository.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow
+
+That will rerun all failed installation routines. Alternatively, you can navigate to the ChemicalToolBoX repository in 
+your browser and repair manually: 
+Top Panel → Admin → Manage installed tool shed repositories → chemicaltoolbox → Repository Actions → Repair repository
+
+------
+
+
+On slow computers and during the compilation of large software libraries, like R, 
+the Tool Shed can run into a timeout and kills the installation.
+That problem is known and should be fixed in the near future.
+
+If you encouter a timeout or 'hung' during the installation you can increase the ``threadpool_kill_thread_limit`` in your universe_wsgi.ini file.
+
+
+------
+
+**Database locking errors**
 
-See http://www.galaxyproject.org for information about the Galaxy Project.
+Please note that Galaxy per default uses a SQLite database. Sqlite is not intended for production use. 
+With multiple users or complex components, like that workflow, you will see database locking errors. 
+We highly recommend to use PostgreSQL for any kind of production system.
+
+
+.. _Galaxy wiki: http://wiki.galaxyproject.org/
+
+
+Workflows
+=========
+
+An example workflow is located in the `Tool Shed`::
+
+	  http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow
+
+You can install the workflow with the API::
+
+	python ./scripts/api/install_tool_shed_repositories.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow --tool-deps --repository-deps --panel-section-name BlockClust
+
+or as described above via webbrowser. You have now successfully installed the workflow, 
+to import it to all your users you need to go to the admin panel, choose the worklow and import it.
+For more information have a look at the Galaxy wiki::
+
+	http://wiki.galaxyproject.org/ToolShedWorkflowSharing#Finding_workflows_in_tool_shed_repositories
+
+Please **note** that Galaxy per default uses a SQLite database. Sqlite is not intended for production use. 
+With multiple users or complex components, like that workflow, you will see database locking errors. 
+We highly recommend to use PostgreSQL for any kind of production system.
+
 
 
 Sample Data
 ===========
 
-As an example, we will use the first public assembly of the 2011 Shiga-toxin
-producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the
-open-source crowd-sourcing analysis described in Rohde et al. (2011) and here:
-https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
-
-You can upload this assembly directly into Galaxy using the "Upload File" tool
-with either of these URLs - Galaxy should recognise this is a FASTA file with
-3,057 sequences:
-
-* http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt
-* https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt
-
-This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled
-by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the
-MIRA 3.2 assembler. It was initially released via his blog,
-http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/
-
-We will also need a training set of known *E. coli* genes, for example the
-model strain *Escherichia coli* str. K-12 substr. MG1655 which is well
-annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the
-gene nucleotide sequences directly into Galaxy via this URL, which Galaxy
-should recognise as a FASTA file with 4,321 sequences:
-
-* ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn
-
-Then run the workflow, which should produce 2,333 predicted genes for the
-TY2482 assembly (two FASTA files, nucleotide and protein sequences).
 
 
 Citation
@@ -61,28 +244,11 @@
 
 P. Videm  at al...
 
-For Glimmer3 please cite:
-
-Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007)
-Identifying bacterial genes and endosymbiont DNA with Glimmer.
-Bioinformatics 23(6), 673-679.
-http://dx.doi.org/10.1093/bioinformatics/btm009
-
-For EMBOSS please cite:
-
-Rice, P., Longden, I. and Bleasby, A. (2000)
-EMBOSS: The European Molecular Biology Open Software Suite
-Trends in Genetics 16(6), 276-277.
-http://dx.doi.org/10.1016/S0168-9525(00)02024-2
 
 
 Additional References
 =====================
 
-Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011)
-Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4.
-New England Journal of Medicine 365, 718-724.
-http://dx.doi.org/10.1056/NEJMoa1107643
 
 
 Availability
@@ -90,11 +256,11 @@
 
 This workflow is available on the main Galaxy Tool Shed:
 
-http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow
+ http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow 
 
 Development is being done on github:
 
-https://github.com/bgruening/galaxytools/workflows/glimmer3/
+https://github.com/bgruening/galaxytools/tree/master/workflows/blockclust
 
 
 Dependencies
@@ -102,5 +268,11 @@
 
 These dependencies should be resolved automatically via the Galaxy Tool Shed:
 
-* http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3
-* http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5
+* http://testtoolshed.g2.bx.psu.edu/view/iuc/package_samtools_0_1_19 
+* http://testtoolshed.g2.bx.psu.edu/view/iuc/package_r_3_0_1
+* http://testtoolshed.g2.bx.psu.edu/view/rnateam/package_segemehl_0_1_6 
+* http://testtoolshed.g2.bx.psu.edu/view/iuc/msa_datatypes 
+* http://testtoolshed.g2.bx.psu.edu/view/iuc/package_infernal_1_1rc4 
+* http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockbuster 
+* http://testtoolshed.g2.bx.psu.edu/view/bgruening/package_eden_1_1
+* http://testtoolshed.g2.bx.psu.edu/view/iuc/package_mcl_12_135