Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
changeset 13:c87b166cbfe1 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 777167fc30406b792d0cf924753069526f8b3e5e-dirty
author | pjbriggs |
---|---|
date | Mon, 18 Jun 2018 10:05:58 -0400 |
parents | fb2af52d67d1 |
children | ed175a4b247f |
files | README amplicon_analysis_pipeline.py tool_dependencies.xml |
diffstat | 3 files changed, 5 insertions(+), 252 deletions(-) [+] |
line wrap: on
line diff
--- a/README Mon Jun 18 07:54:54 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,252 +0,0 @@ -Amplicon_analysis-galaxy -======================== - -A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline -script at https://github.com/MTutino/Amplicon_analysis - -The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq -(Casava >= 1.8) and performs the following operations: - - * QC and clean up of input data - * Removal of singletons and chimeras and building of OTU table - and phylogenetic tree - * Beta and alpha diversity of analysis - -Usage documentation -=================== - -Usage of the tool (including required inputs) is documented within -the ``help`` section of the tool XML. - -Installing the tool in a Galaxy instance -======================================== - -The following sections describe how to install the tool files, -dependencies and reference data, and how to configure the Galaxy -instance to detect the dependencies and reference data correctly -at run time. - -1. Install the dependencies ---------------------------- - -The ``install_tool_deps.sh`` script can be used to fetch and install the -dependencies locally, for example:: - - install_tool_deps.sh /path/to/local_tool_dependencies - -This can take some time to complete. When finished it should have -created a set of directories containing the dependencies under the -specified top level directory. - -2. Install the tool files -------------------------- - -The core tool is hosted on the Galaxy toolshed, so it can be installed -directly from there (this is the recommended route): - - * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ - -Alternatively it can be installed manually; in this case there are two -files to install: - - * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) - * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) - -Put these in a directory that is visible to Galaxy (e.g. a -``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` -file to tell Galaxy to offer the tool by adding the line e.g.:: - - <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> - -3. Install the reference data ------------------------------ - -The script ``References.sh`` from the pipeline package at -https://github.com/MTutino/Amplicon_analysis can be run to install -the reference data, for example:: - - cd /path/to/pipeline/data - wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh - /bin/bash ./References.sh - -will install the data in ``/path/to/pipeline/data``. - -**NB** The final amount of data downloaded and uncompressed will be -around 6GB. - -4. Configure dependencies and reference data in Galaxy ------------------------------------------------------- - -The final steps are to make your Galaxy installation aware of the -tool dependencies and reference data, so it can locate them both when -the tool is run. - -To target the tool dependencies installed previously, add the -following lines to the ``dependency_resolvers_conf.xml`` file in the -Galaxy ``config`` directory:: - - <dependency_resolvers> - ... - <galaxy_packages base_path="/path/to/local_tool_dependencies" /> - <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" /> - ... - </dependency_resolvers> - -(NB it is recommended to place these *before* the ``<conda ... />`` -resolvers) - -(If you're not familiar with dependency resolvers in Galaxy then -see the documentation at -https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html -for more details.) - -The tool locates the reference data via an environment variable called -``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent -directory where the reference data has been installed. - -There are various ways to do this, depending on how your Galaxy -installation is configured: - - * **For local instances:** add a line to set it in the - ``config/local_env.sh`` file of your Galaxy installation, e.g.:: - - export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data - - * **For production instances:** set the value in the ``job_conf.xml`` - configuration file, e.g.:: - - <destination id="amplicon_analysis"> - <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> - </destination> - - and then specify that the pipeline tool uses this destination:: - - <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> - - (For more about job destinations see the Galaxy documentation at - https://galaxyproject.org/admin/config/jobs/#job-destinations) - -5. Enable rendering of HTML outputs from pipeline -------------------------------------------------- - -To ensure that HTML outputs are displayed correctly in Galaxy -(for example the Vsearch OTU table heatmaps), Galaxy needs to be -configured not to sanitize the outputs from the ``Amplicon_analysis`` -tool. - -Either: - - * **For local instances:** set ``sanitize_all_html = False`` in - ``config/galaxy.ini`` (nb don't do this on production servers or - public instances!); or - - * **For production instances:** add the ``Amplicon_analysis`` tool - to the display whitelist in the Galaxy instance: - - - Set ``sanitize_whitelist_file = config/whitelist.txt`` in - ``config/galaxy.ini`` and restart Galaxy; - - Go to ``Admin>Manage Display Whitelist``, check the box for - ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' - search function to help locate it) and click on - ``Submit new whitelist`` to update the settings. - -Additional details -================== - -Some other things to be aware of: - - * Note that using the Silva database requires a minimum of 18Gb RAM - -Known problems -============== - - * Only the ``VSEARCH`` pipeline in Mauro's script is currently - available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` - pipelines have yet to be implemented. - * The images in the tool help section are not visible if the - tool has been installed locally, or if it has been installed in - a Galaxy instance which is served from a subdirectory. - - These are both problems with Galaxy and not the tool, see - https://github.com/galaxyproject/galaxy/issues/4490 and - https://github.com/galaxyproject/galaxy/issues/1676 - -Appendix: availability of tool dependencies -=========================================== - -The tool takes its dependencies from the underlying pipeline script (see -https://github.com/MTutino/Amplicon_analysis/blob/master/README.md -for details). - -As noted above, currently the ``install_tool_deps.sh`` script can be -used to manually install the dependencies for a local tool install. - -In principle these should also be available if the tool were installed -from a toolshed. However it would be preferrable in this case to get as -many of the dependencies as possible via the ``conda`` dependency -resolver. - -The following are known to be available via conda, with the required -version: - - - cutadapt 1.8.1 - - sickle-trim 1.33 - - bioawk 1.0 - - fastqc 0.11.3 - - R 3.2.0 - - spades 3.5.0 - - qiime 1.8.0 - - blast-legacy 2.2.26 - - vsearch 1.1.3 - - fasta-splitter 0.2.4 - - rdp_classifier 2.2 - -The following dependencies are currently unavailable: - - - fasta_number (need 02jun2015) - - microbiomeutil (need r20110519) - -(NB usearch 6.1.544 and 8.0.1623 are special cases which must be -handled outside of Galaxy's dependency management systems.) - -History -======= - -========== ====================================================================== -Version Changes ----------- ---------------------------------------------------------------------- -1.2.2.1 Update to get dependencies from bioconda -1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes - jackknifed analysis which is not captured by Galaxy tool) -1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds - option to use the Human Oral Microbiome Database v15.1, and - updates SILVA database to v123) -1.1.0 First official version on Galaxy toolshed. -1.0.6 Expand inline documentation to provide detailed usage guidance. -1.0.5 Updates including: - - - Capture read counts from quality control as new output dataset - - Capture FastQC per-base quality boxplots for each sample as - new output dataset - - Add support for -l option (sliding window length for trimming) - - Default for -L set to "200" -1.0.4 Various updates: - - - Additional outputs are captured when a "Categories" file is - supplied (alpha diversity rarefaction curves and boxplots) - - Sample names derived from Fastqs in a collection of pairs - are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) - - Input Fastqs can now be of more general ``fastq`` type - - Log file outputs are captured in new output dataset - - User can specify a "title" for the job which is copied into - the dataset names (to distinguish outputs from different runs) - - Improved detection and reporting of problems with input - Metatable -1.0.3 Take the sample names from the collection dataset names when - using collection as input (this is now the default input mode); - collect additional output dataset; disable ``usearch``-based - pipelines (i.e. ``UPARSE`` and ``QIIME``). -1.0.2 Enable support for FASTQs supplied via dataset collections and - fix some broken output datasets. -1.0.1 Initial version -========== ======================================================================
--- a/amplicon_analysis_pipeline.py Mon Jun 18 07:54:54 2018 -0400 +++ b/amplicon_analysis_pipeline.py Mon Jun 18 10:05:58 2018 -0400 @@ -191,6 +191,7 @@ find_executable("fasta-splitter")) if fasta_splitter: os.symlink(vsearch,os.path.join("bin","fasta-splitter.pl")) + print "-- made symlink to %s" % fasta_splitter else: sys.stderr.write("Missing 'fasta-splitter[.pl]'\n")
--- a/tool_dependencies.xml Mon Jun 18 07:54:54 2018 -0400 +++ b/tool_dependencies.xml Mon Jun 18 10:05:58 2018 -0400 @@ -24,6 +24,10 @@ <source>THIRD_STEP.sh</source> <destination>$INSTALL_DIR/Amplicon_analysis_pipeline</destination> </action> + <action type="move_directory_files"> + <source>uc2otutab</source> + <destination>$INSTALL_DIR/Amplicon_analysis_pipeline/uc2otutab</destination> + </action> <action type="set_environment"> <environment_variable action="prepend_to" name="PATH">$INSTALL_DIR/Amplicon_analysis_pipeline</environment_variable> </action>