Mercurial > repos > pjbriggs > amplicon_analysis_pipeline

--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.gitignore	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,7 +0,0 @@
-\#*\#
-.\#*
-*~
-*.pyc
-*.bak
-auto_process_settings_local.py
-settings.ini
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.shed.yml	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,16 +0,0 @@
----
-categories:
-- Metagenomics
-description: Analyse paired-end 16S rRNA data from Illumina Miseq
-homepage_url: https://github.com/MTutino/Amplicon_analysis
-long_description: |
-  A Galaxy tool wrapper to Mauro Tutino's Amplicon_analysis pipeline
-  at https://github.com/MTutino/Amplicon_analysis
-
-  The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
-  (Casava >= 1.8) and performs: QC and clean up of input data; removal of
-  singletons and chimeras and building of OTU table and phylogenetic tree;
-  beta and alpha diversity analysis
-name: amplicon_analysis_pipeline
-owner: pjbriggs
-remote_repository_url: https://github.com/pjbriggs/Amplicon_analysis-galaxy
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/README.rst	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,213 +0,0 @@
-Amplicon_analysis-galaxy
-========================
-
-A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
-script at https://github.com/MTutino/Amplicon_analysis
-
-The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
-(Casava >= 1.8) and performs the following operations:
-
- * QC and clean up of input data
- * Removal of singletons and chimeras and building of OTU table
-   and phylogenetic tree
- * Beta and alpha diversity of analysis
-
-Usage documentation
-===================
-
-Usage of the tool (including required inputs) is documented within
-the ``help`` section of the tool XML.
-
-Installing the tool in a Galaxy instance
-========================================
-
-The following sections describe how to install the tool files,
-dependencies and reference data, and how to configure the Galaxy
-instance to detect the dependencies and reference data correctly
-at run time.
-
-1. Install the tool from the toolshed
--------------------------------------
-
-The core tool is hosted on the Galaxy toolshed, so it can be installed
-directly from there (this is the recommended route):
-
- * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
-
-Alternatively it can be installed manually; in this case there are two
-files to install:
-
- * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
- * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
-
-Put these in a directory that is visible to Galaxy (e.g. a
-``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
-file to tell Galaxy to offer the tool by adding the line e.g.::
-
-    <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
-
-2. Install the reference data
------------------------------
-
-The script ``References.sh`` from the pipeline package at
-https://github.com/MTutino/Amplicon_analysis can be run to install
-the reference data, for example::
-
-    cd /path/to/pipeline/data
-    wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
-    /bin/bash ./References.sh
-
-will install the data in ``/path/to/pipeline/data``.
-
-**NB** The final amount of data downloaded and uncompressed will be
-around 9GB.
-
-3. Configure reference data location in Galaxy
-----------------------------------------------
-
-The final step is to make your Galaxy installation aware of the
-location of the reference data, so it can locate them both when the
-tool is run.
-
-The tool locates the reference data via an environment variable called
-``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
-directory where the reference data has been installed.
-
-There are various ways to do this, depending on how your Galaxy
-installation is configured:
-
- * **For local instances:** add a line to set it in the
-   ``config/local_env.sh`` file of your Galaxy installation (you
-   may need to create a new empty file first), e.g.::
-
-       export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
-
- * **For production instances:** set the value in the ``job_conf.xml``
-   configuration file, e.g.::
-
-       <destination id="amplicon_analysis">
-          <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
-       </destination>
-
-   and then specify that the pipeline tool uses this destination::
-
-       <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
-
-   (For more about job destinations see the Galaxy documentation at
-   https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
-
-4. Enable rendering of HTML outputs from pipeline
--------------------------------------------------
-
-To ensure that HTML outputs are displayed correctly in Galaxy
-(for example the Vsearch OTU table heatmaps), Galaxy needs to be
-configured not to sanitize the outputs from the ``Amplicon_analysis``
-tool.
-
-Either:
-
- * **For local instances:** set ``sanitize_all_html = False`` in
-   ``config/galaxy.ini`` (nb don't do this on production servers or
-   public instances!); or
-
- * **For production instances:** add the ``Amplicon_analysis`` tool
-   to the display whitelist in the Galaxy instance:
-
-   - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
-     ``config/galaxy.ini`` and restart Galaxy;
-   - Go to ``Admin>Manage Display Whitelist``, check the box for
-     ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
-     search function to help locate it) and click on
-     ``Submit new whitelist`` to update the settings.
-
-Additional details
-==================
-
-Some other things to be aware of:
-
- * Note that using the Silva database requires a minimum of 18Gb RAM
-
-Known problems
-==============
-
- * Only the ``VSEARCH`` pipeline in Mauro's script is currently
-   available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
-   pipelines have yet to be implemented.
- * The images in the tool help section are not visible if the
-   tool has been installed locally, or if it has been installed in
-   a Galaxy instance which is served from a subdirectory.
-
-   These are both problems with Galaxy and not the tool, see
-   https://github.com/galaxyproject/galaxy/issues/4490 and
-   https://github.com/galaxyproject/galaxy/issues/1676
-
-Appendix: installing the dependencies manually
-==============================================
-
-If the tool is installed from the Galaxy toolshed (recommended) then
-the dependencies should be installed automatically and this step can
-be skipped.
-
-Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used
-to fetch and install the dependencies locally, for example::
-
-    install_amplicon_analysis.sh /path/to/local_tool_dependencies
-
-(This is the same script as is used to install dependencies from the
-toolshed.) This can take some time to complete, and when completed will
-have created a directory called ``Amplicon_analysis-1.2.3`` containing
-the dependencies under the specified top level directory.
-
-**NB** The installed dependencies will occupy around 2.6G of disk
-space.
-
-You will need to make sure that the ``bin`` subdirectory of this
-directory is on Galaxy's ``PATH`` at runtime, for the tool to be able
-to access the dependencies - for example by adding a line to the
-``local_env.sh`` file like::
-
-    export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
-
-History
-=======
-
-========== ======================================================================
-Version    Changes
----------- ----------------------------------------------------------------------
-1.3.5.0    Updated to Amplicon_Analysis_Pipeline version 1.3.5.
-1.2.3.0    Updated to Amplicon_Analysis_Pipeline version 1.2.3; install
-           dependencies via tool_dependencies.xml.
-1.2.2.0    Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes
-           jackknifed analysis which is not captured by Galaxy tool)
-1.2.1.0    Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds
-           option to use the Human Oral Microbiome Database v15.1, and
-           updates SILVA database to v123)
-1.1.0      First official version on Galaxy toolshed.
-1.0.6      Expand inline documentation to provide detailed usage guidance.
-1.0.5      Updates including:
-
-           - Capture read counts from quality control as new output dataset
-           - Capture FastQC per-base quality boxplots for each sample as
-             new output dataset
-           - Add support for -l option (sliding window length for trimming)
-           - Default for -L set to "200"
-1.0.4      Various updates:
-
-	   - Additional outputs are captured when a "Categories" file is
-	     supplied (alpha diversity rarefaction curves and boxplots)
-	   - Sample names derived from Fastqs in a collection of pairs
-	     are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
-           - Input Fastqs can now be of more general ``fastq`` type
-	   - Log file outputs are captured in new output dataset
-	   - User can specify a "title" for the job which is copied into
-	     the dataset names (to distinguish outputs from different runs)
-	   - Improved detection and reporting of problems with input
-	     Metatable
-1.0.3      Take the sample names from the collection dataset names when
-           using collection as input (this is now the default input mode);
-           collect additional output dataset; disable ``usearch``-based
-           pipelines (i.e. ``UPARSE`` and ``QIIME``).
-1.0.2      Enable support for FASTQs supplied via dataset collections and
-           fix some broken output datasets.
-1.0.1      Initial version
-========== ======================================================================
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.py	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,370 +0,0 @@
-#!/usr/bin/env python
-#
-# Wrapper script to run Amplicon_analysis_pipeline.sh
-# from Galaxy tool
-
-import sys
-import os
-import argparse
-import subprocess
-import glob
-
-class PipelineCmd(object):
-    def __init__(self,cmd):
-        self.cmd = [str(cmd)]
-    def add_args(self,*args):
-        for arg in args:
-            self.cmd.append(str(arg))
-    def __repr__(self):
-        return ' '.join([str(arg) for arg in self.cmd])
-
-def ahref(target,name=None,type=None):
-    if name is None:
-        name = os.path.basename(target)
-    ahref = "<a href='%s'" % target
-    if type is not None:
-        ahref += " type='%s'" % type
-    ahref += ">%s</a>" % name
-    return ahref
-
-def check_errors():
-    # Errors in Amplicon_analysis_pipeline.log
-    with open('Amplicon_analysis_pipeline.log','r') as pipeline_log:
-        log = pipeline_log.read()
-        if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log:
-            print_error("""*** Sample IDs don't match dataset names ***
-
-The sample IDs (first column of the Metatable file) don't match the
-supplied sample names for the input Fastq pairs.
-""")
-    # Errors in pipeline output
-    with open('pipeline.log','r') as pipeline_log:
-        log = pipeline_log.read()
-        if "Errors and/or warnings detected in mapping file" in log:
-            with open("Metatable_log/Metatable.log","r") as metatable_log:
-                # Echo the Metatable log file to the tool log
-                print_error("""*** Error in Metatable mapping file ***
-
-%s""" % metatable_log.read())
-        elif "No header line was found in mapping file" in log:
-            # Report error to the tool log
-            print_error("""*** No header in Metatable mapping file ***
-
-Check you've specified the correct file as the input Metatable""")
-
-def print_error(message):
-    width = max([len(line) for line in message.split('\n')]) + 4
-    sys.stderr.write("\n%s\n" % ('*'*width))
-    for line in message.split('\n'):
-        sys.stderr.write("* %s%s *\n" % (line,' '*(width-len(line)-4)))
-    sys.stderr.write("%s\n\n" % ('*'*width))
-
-def clean_up_name(sample):
-    # Remove extensions and trailing "_L[0-9]+_001" from
-    # Fastq pair names
-    sample_name = '.'.join(sample.split('.')[:1])
-    split_name = sample_name.split('_')
-    if split_name[-1] == "001":
-        split_name = split_name[:-1]
-    if split_name[-1].startswith('L'):
-        try:
-            int(split_name[-1][1:])
-            split_name = split_name[:-1]
-        except ValueError:
-            pass
-    return '_'.join(split_name)
-
-def list_outputs(filen=None):
-    # List the output directory contents
-    # If filen is specified then will be the filename to
-    # write to, otherwise write to stdout
-    if filen is not None:
-        fp = open(filen,'w')
-    else:
-        fp = sys.stdout
-    results_dir = os.path.abspath("RESULTS")
-    fp.write("Listing contents of output dir %s:\n" % results_dir)
-    ix = 0
-    for d,dirs,files in os.walk(results_dir):
-        ix += 1
-        fp.write("-- %d: %s\n" % (ix,
-                                  os.path.relpath(d,results_dir)))
-        for f in files:
-            ix += 1
-            fp.write("---- %d: %s\n" % (ix,
-                                        os.path.relpath(f,results_dir)))
-    # Close output file
-    if filen is not None:
-        fp.close()
-
-if __name__ == "__main__":
-    # Command line
-    print "Amplicon analysis: starting"
-    p = argparse.ArgumentParser()
-    p.add_argument("metatable",
-                   metavar="METATABLE_FILE",
-                   help="Metatable.txt file")
-    p.add_argument("fastq_pairs",
-                   metavar="SAMPLE_NAME FQ_R1 FQ_R2",
-                   nargs="+",
-                   default=list(),
-                   help="Triplets of SAMPLE_NAME followed by "
-                   "a R1/R2 FASTQ file pair")
-    p.add_argument("-g",dest="forward_pcr_primer")
-    p.add_argument("-G",dest="reverse_pcr_primer")
-    p.add_argument("-q",dest="trimming_threshold")
-    p.add_argument("-O",dest="minimum_overlap")
-    p.add_argument("-L",dest="minimum_length")
-    p.add_argument("-l",dest="sliding_window_length")
-    p.add_argument("-P",dest="pipeline",
-                   choices=["Vsearch","DADA2"],
-                   type=str,
-                   default="Vsearch")
-    p.add_argument("-S",dest="use_silva",action="store_true")
-    p.add_argument("-H",dest="use_homd",action="store_true")
-    p.add_argument("-r",dest="reference_data_path")
-    p.add_argument("-c",dest="categories_file")
-    args = p.parse_args()
-
-    # Build the environment for running the pipeline
-    print "Amplicon analysis: building the environment"
-    metatable_file = os.path.abspath(args.metatable)
-    os.symlink(metatable_file,"Metatable.txt")
-    print "-- made symlink to Metatable.txt"
-
-    # Link to Categories.txt file (if provided)
-    if args.categories_file is not None:
-        categories_file = os.path.abspath(args.categories_file)
-        os.symlink(categories_file,"Categories.txt")
-        print "-- made symlink to Categories.txt"
-
-    # Link to FASTQs and construct Final_name.txt file
-    sample_names = []
-    print "-- making Final_name.txt"
-    with open("Final_name.txt",'w') as final_name:
-        fastqs = iter(args.fastq_pairs)
-        for sample_name,fqr1,fqr2 in zip(fastqs,fastqs,fastqs):
-            sample_name = clean_up_name(sample_name)
-            print "   %s" % sample_name
-            r1 = "%s_R1_.fastq" % sample_name
-            r2 = "%s_R2_.fastq" % sample_name
-            os.symlink(fqr1,r1)
-            os.symlink(fqr2,r2)
-            final_name.write("%s\n" % '\t'.join((r1,sample_name)))
-            final_name.write("%s\n" % '\t'.join((r2,sample_name)))
-            sample_names.append(sample_name)
-
-    # Reference database
-    if args.use_silva:
-        ref_database = "silva"
-    elif args.use_homd:
-        ref_database = "homd"
-    else:
-        ref_database = "gg"
-
-    # Construct the pipeline command
-    print "Amplicon analysis: constructing pipeline command"
-    pipeline = PipelineCmd("Amplicon_analysis_pipeline.sh")
-    if args.forward_pcr_primer:
-        pipeline.add_args("-g",args.forward_pcr_primer)
-    if args.reverse_pcr_primer:
-        pipeline.add_args("-G",args.reverse_pcr_primer)
-    if args.trimming_threshold:
-        pipeline.add_args("-q",args.trimming_threshold)
-    if args.minimum_overlap:
-        pipeline.add_args("-O",args.minimum_overlap)
-    if args.minimum_length:
-        pipeline.add_args("-L",args.minimum_length)
-    if args.sliding_window_length:
-        pipeline.add_args("-l",args.sliding_window_length)
-    if args.reference_data_path:
-        pipeline.add_args("-r",args.reference_data_path)
-    pipeline.add_args("-P",args.pipeline)
-    if ref_database == "silva":
-        pipeline.add_args("-S")
-    elif ref_database == "homd":
-        pipeline.add_args("-H")
-
-    # Echo the pipeline command to stdout
-    print "Running %s" % pipeline
-
-    # Run the pipeline
-    with open("pipeline.log","w") as pipeline_out:
-        try:
-            subprocess.check_call(pipeline.cmd,
-                                  stdout=pipeline_out,
-                                  stderr=subprocess.STDOUT)
-            exit_code = 0
-            print "Pipeline completed ok"
-        except subprocess.CalledProcessError as ex:
-            # Non-zero exit status
-            sys.stderr.write("Pipeline failed: exit code %s\n" %
-                             ex.returncode)
-            exit_code = ex.returncode
-        except Exception as ex:
-            # Some other problem
-            sys.stderr.write("Unexpected error: %s\n" % str(ex))
-            exit_code = 1
-
-    # Write out the list of outputs
-    outputs_file = "Pipeline_outputs.txt"
-    list_outputs(outputs_file)
-
-    # Check for log file
-    log_file = "Amplicon_analysis_pipeline.log"
-    if os.path.exists(log_file):
-        print "Found log file: %s" % log_file
-        if exit_code == 0:
-            # Create an HTML file to link to log files etc
-            # NB the paths to the files should be correct once
-            # copied by Galaxy on job completion
-            with open("pipeline_outputs.html","w") as html_out:
-                html_out.write("""<html>
-<head>
-<title>Amplicon analysis pipeline: log files</title>
-<head>
-<body>
-<h1>Amplicon analysis pipeline: log files</h1>
-<ul>
-""")
-                html_out.write(
-                    "<li>%s</li>\n" %
-                    ahref("Amplicon_analysis_pipeline.log",
-                          type="text/plain"))
-                html_out.write(
-                    "<li>%s</li>\n" %
-                    ahref("pipeline.log",type="text/plain"))
-                html_out.write(
-                    "<li>%s</li>\n" %
-                    ahref("Pipeline_outputs.txt",
-                          type="text/plain"))
-                html_out.write(
-                    "<li>%s</li>\n" %
-                    ahref("Metatable.html"))
-                html_out.write("""<ul>
-</body>
-</html>
-""")
-        else:
-            # Check for known error messages
-            check_errors()
-            # Write pipeline stdout to tool stderr
-            sys.stderr.write("\nOutput from pipeline:\n")
-            with open("pipeline.log",'r') as log:
-                sys.stderr.write("%s" % log.read())
-            # Write log file contents to tool log
-            print "\nAmplicon_analysis_pipeline.log:"
-            with open(log_file,'r') as log:
-                print "%s" % log.read()
-    else:
-        sys.stderr.write("ERROR missing log file \"%s\"\n" %
-                         log_file)
-
-    # Handle FastQC boxplots
-    print "Amplicon analysis: collating per base quality boxplots"
-    with open("fastqc_quality_boxplots.html","w") as quality_boxplots:
-        # PHRED value for trimming
-        phred_score = 20
-        if args.trimming_threshold is not None:
-            phred_score = args.trimming_threshold
-        # Write header for HTML output file
-        quality_boxplots.write("""<html>
-<head>
-<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title>
-<head>
-<body>
-<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1>
-""")
-        # Look for raw and trimmed FastQC output for each sample
-        for sample_name in sample_names:
-            fastqc_dir = os.path.join(sample_name,"FastQC")
-            quality_boxplots.write("<h2>%s</h2>" % sample_name)
-            for d in ("Raw","cutdapt_sickle/Q%s" % phred_score):
-                quality_boxplots.write("<h3>%s</h3>" % d)
-                fastqc_html_files = glob.glob(
-                    os.path.join(fastqc_dir,d,"*_fastqc.html"))
-                if not fastqc_html_files:
-                    quality_boxplots.write("<p>No FastQC outputs found</p>")
-                    continue
-                # Pull out the per-base quality boxplots
-                for f in fastqc_html_files:
-                    boxplot = None
-                    with open(f) as fp:
-                        for line in fp.read().split(">"):
-                            try:
-                                line.index("alt=\"Per base quality graph\"")
-                                boxplot = line + ">"
-                                break
-                            except ValueError:
-                                pass
-                    if boxplot is None:
-                        boxplot = "Missing plot"
-                    quality_boxplots.write("<h4>%s</h4><p>%s</p>" %
-
-                                           (os.path.basename(f),
-                                            boxplot))
-            quality_boxplots.write("""</body>
-</html>
-""")
-
-    # Handle DADA2 error rate plot PDFs
-    if args.pipeline == "DADA2":
-        print("Amplicon analysis: collecting error rate plots")
-        error_rate_plots_dir = os.path.abspath(
-            os.path.join("DADA2_OTU_tables",
-                         "Error_rate_plots"))
-        error_rate_plot_pdfs = [os.path.basename(pdf)
-                                for pdf in
-                                sorted(glob.glob(
-                                    os.path.join(error_rate_plots_dir,"*.pdf")))]
-        with open("error_rate_plots.html","w") as error_rate_plots_out:
-            error_rate_plots_out.write("""<html>
-<head>
-<title>Amplicon analysis pipeline: DADA2 Error Rate Plots</title>
-<head>
-<body>
-<h1>Amplicon analysis pipeline: DADA2 Error Rate Plots</h1>
-""")
-            error_rate_plots_out.write("<ul>\n")
-            for pdf in error_rate_plot_pdfs:
-                error_rate_plots_out.write("<li>%s</li>\n" % ahref(pdf))
-            error_rate_plots_out.write("<ul>\n")
-            error_rate_plots_out.write("""</body>
-</html>
-""")
-
-    # Handle additional output when categories file was supplied
-    if args.categories_file is not None:
-        # Alpha diversity boxplots
-        print "Amplicon analysis: indexing alpha diversity boxplots"
-        boxplots_dir = os.path.abspath(
-            os.path.join("RESULTS",
-                         "%s_%s" % (args.pipeline,
-                                    ref_database),
-                         "Alpha_diversity",
-                         "Alpha_diversity_boxplot",
-                         "Categories_shannon"))
-        print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir
-        boxplot_pdfs = [os.path.basename(pdf)
-                        for pdf in
-                        sorted(glob.glob(
-                            os.path.join(boxplots_dir,"*.pdf")))]
-        with open("alpha_diversity_boxplots.html","w") as boxplots_out:
-            boxplots_out.write("""<html>
-<head>
-<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title>
-<head>
-<body>
-<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1>
-""")
-            boxplots_out.write("<ul>\n")
-            for pdf in boxplot_pdfs:
-                boxplots_out.write("<li>%s</li>\n" % ahref(pdf))
-            boxplots_out.write("<ul>\n")
-            boxplots_out.write("""</body>
-</html>
-""")
-
-    # Finish
-    print "Amplicon analysis: finishing, exit code: %s" % exit_code
-    sys.exit(exit_code)
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.xml	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,502 +0,0 @@
-<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.3.5.0">
-  <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description>
-  <requirements>
-    <requirement type="package" version="1.3.5">amplicon_analysis_pipeline</requirement>
-  </requirements>
-  <stdio>
-    <exit_code range="1:" />
-  </stdio>
-  <command><![CDATA[
-
-  ## Convenience variable for pipeline name
-  #set $pipeline_name = $pipeline.pipeline_name
-
-  ## Set the reference database name
-  #if str( $pipeline_name ) == "DADA2"
-     #set reference_database_name = "silva"
-  #else
-     #set reference_database = $pipeline.reference_database
-     #if $reference_database == "-S"
-        #set reference_database_name = "silva"
-     #else if $reference_database == "-H"
-        #set reference_database_name = "homd"
-     #else
-        #set reference_database_name = "gg"
-     #end if
-  #end if
-
-  ## Run the amplicon analysis pipeline wrapper
-  python $__tool_directory__/amplicon_analysis_pipeline.py
-  ## Set options
-  #if str( $forward_pcr_primer ) != ""
-  -g "$forward_pcr_primer"
-  #end if
-  #if str( $reverse_pcr_primer ) != ""
-  -G "$reverse_pcr_primer"
-  #end if
-  #if str( $trimming_threshold ) != ""
-  -q $trimming_threshold
-  #end if
-  #if str( $sliding_window_length ) != ""
-  -l $sliding_window_length
-  #end if
-  #if str( $minimum_overlap ) != ""
-  -O $minimum_overlap
-  #end if
-  #if str( $minimum_length ) != ""
-  -L $minimum_length
-  #end if
-  -P $pipeline_name
-  -r \${AMPLICON_ANALYSIS_REF_DATA_PATH-ReferenceData}
-  #if str( $pipeline_name ) != "DADA2"
-    ${reference_database}
-  #end if
-  #if str($categories_file_in) != 'None'
-    -c "${categories_file_in}"
-  #end if
-  ## Input files
-  "${metatable_file_in}"
-  ## FASTQ pairs
-  #if str($input_type.pairs_or_collection) == "collection"
-    #set fastq_pairs = $input_type.fastq_collection
-  #else
-    #set fastq_pairs = $input_type.fastq_pairs
-  #end if
-  #for $fq_pair in $fastq_pairs
-    "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}"
-  #end for
-  &&
-
-  ## Collect outputs
-  cp Metatable_log/Metatable_mod.txt "${metatable_mod}" &&
-  #if str( $pipeline_name ) == "Vsearch"
-    # Vsearch-specific
-    cp ${pipeline_name}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" &&
-    cp Multiplexed_files/${pipeline_name}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" &&
-    cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" &&
-  #else
-    # DADA2-specific
-    cp ${pipeline_name}_OTU_tables/DADA2_tax_OTU_table.biom "${tax_otu_table_biom_file}" &&
-    cp ${pipeline_name}_OTU_tables/seqs.fa "${dereplicated_nonchimera_otus_fasta}" &&
-  #end if
-  cp ${pipeline_name}_OTU_tables/otus.tre "${otus_tre_file}" &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/table_summary.txt "${table_summary_file}" &&
-  cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" &&
-
-  ## OTU table heatmap
-  cp RESULTS/${pipeline_name}_${reference_database_name}/Heatmap.pdf "${heatmap_otu_table_pdf}"" &&
-
-  ## HTML outputs
-
-  ## Phylum genus barcharts
-  mkdir $phylum_genus_dist_barcharts_html.files_path &&
-  cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barcharts_html.files_path &&
-  cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/raw_data $phylum_genus_dist_barcharts_html.files_path &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/bar_charts.html "${phylum_genus_dist_barcharts_html}" &&
-
-  ## Beta diversity weighted 2d plots
-  mkdir $beta_div_even_weighted_2d_plots.files_path &&
-  cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/* $beta_div_even_weighted_2d_plots.files_path &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_weighted_2d_plots}" &&
-
-  ## Beta diversity unweighted 2d plots
-  mkdir $beta_div_even_unweighted_2d_plots.files_path &&
-  cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/* $beta_div_even_unweighted_2d_plots.files_path &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_unweighted_2d_plots}" &&
-
-  ## Alpha diversity rarefaction plots
-  mkdir $alpha_div_rarefaction_plots.files_path &&
-  cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/rarefaction_plots.html $alpha_div_rarefaction_plots &&
-  cp -r RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/average_plots $alpha_div_rarefaction_plots.files_path &&
-
-  ## DADA2 error rate plots
-  #if str($pipeline_name) == "DADA2"
-    mkdir $dada2_error_rate_plots.files_path &&
-    cp DADA2_OTU_tables/Error_rate_plots/error_rate_plots.html $dada2_error_rate_plots &&
-    cp -r DADA2_OTU_tables/Error_rate_plots/*.pdf $dada2_error_rate_plots.files_path &&
-  #end if
-
-  ## Categories data
-  #if str($categories_file_in) != 'None'
-    ## Alpha diversity boxplots
-    mkdir $alpha_div_boxplots.files_path &&
-    cp alpha_diversity_boxplots.html "$alpha_div_boxplots" &&
-    cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf $alpha_div_boxplots.files_path &&
-  #end if
-
-  ## Pipeline outputs (log files etc)
-  mkdir $log_files.files_path &&
-  cp Amplicon_analysis_pipeline.log $log_files.files_path &&
-  cp pipeline.log $log_files.files_path &&
-  cp Pipeline_outputs.txt $log_files.files_path &&
-  cp Metatable_log/Metatable.html $log_files.files_path &&
-  cp pipeline_outputs.html "$log_files"
-  ]]></command>
-  <inputs>
-    <param name="title" type="text" value="test" size="25"
-	   label="Title" help="Optional text that will be added to the output dataset names" />
-    <param type="data" name="metatable_file_in" format="tabular"
-	   label="Input Metatable.txt file" />
-    <param type="data" name="categories_file_in" format="txt"
-	   label="Input Categories.txt file" optional="true"
-	   help="(optional)" />
-    <conditional name="input_type">
-      <param name="pairs_or_collection" type="select"
-	     label="Input FASTQ type">
-	<option value="pairs_of_files">Pairs of datasets</option>
-	<option value="collection" selected="true">Dataset pairs in a collection</option>
-      </param>
-      <when value="collection">
-	<param name="fastq_collection" type="data_collection"
-	       format="fastqsanger,fastq" collection_type="list:paired"
-	       label="Collection of FASTQ forward and reverse (R1/R2) pairs"
-	       help="Each FASTQ pair will be treated as one sample; the name of each sample will be taken from the first column of the Metatable file " />
-      </when>
-      <when value="pairs_of_files">
-	<repeat name="fastq_pairs" title="Input fastq pairs" min="1">
-	  <param type="text" name="name" value=""
-		 label="Final name for FASTQ pair" />
-	  <param type="data" name="fastq_r1" format="fastqsanger,fastq"
-		 label="FASTQ with forward reads (R1)" />
-	  <param type="data" name="fastq_r2" format="fastqsanger,fastq"
-		 label="FASTQ with reverse reads (R2)" />
-	</repeat>
-      </when>
-    </conditional>
-    <param type="text" name="forward_pcr_primer" value=""
-	   label="Forward PCR primer sequence"
-	   help="Optional; must not include barcode or adapter sequence (-g)" />
-    <param type="text" name="reverse_pcr_primer" value=""
-	   label="Reverse PCR primer sequence"
-	   help="Optional; must not include barcode or adapter sequence (-G)" />
-    <param type="integer" name="trimming_threshold" value="20"
-	   label="Threshold quality below which read will be trimmed"
-	   help="Phred score; default is 20 (-q)" />
-    <param type="integer" name="minimum_overlap" value="10"
-	   label="Minimum overlap in bp between forward and reverse reads"
-	   help="Default is 10 (-O)" />
-    <param type="integer" name="minimum_length" value="200"
-	   label="Minimum length in bp to keep sequence after overlapping"
-	   help="Default is 200 (-L)" />
-    <param type="integer" name="sliding_window_length" value="10"
-	   label="Minimum length in bp to retain a read after trimming"
-	   help="Supplied to Sickle; default is 10 (-l)" />
-    <conditional name="pipeline">
-      <param type="select" name="pipeline_name"
-	     label="Pipeline to use for analysis">
-	<option value="Vsearch" selected="true" >Vsearch</option>
-	<option value="DADA2">DADA2</option>
-      </param>
-      <when value="Vsearch">
-	<param type="select" name="reference_database"
-	       label="Reference database">
-	  <option value="" selected="true">GreenGenes</option>
-	  <option value="-S">Silva</option>
-	  <option value="-H">Human Oral Microbiome Database (HOMD)</option>
-	</param>
-      </when>
-      <when value="DADA2">
-      </when>
-    </conditional>
-  </inputs>
-  <outputs>
-    <data format="tabular" name="metatable_mod"
-	  label="${tool.name}:${title} Metatable_mod.txt" />
-    <data format="tabular" name="read_counts_out"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} read counts">
-      <filter>pipeline['pipeline_name'] == 'Vsearch'</filter>
-    </data>
-    <data format="biom" name="tax_otu_table_biom_file"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} tax OTU table (biom format)" />
-    <data format="tabular" name="otus_tre_file"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} otus.tre" />
-    <data format="html" name="phylum_genus_dist_barcharts_html"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} phylum genus dist barcharts HTML" />
-    <data format="tabular" name="otus_count_file"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} OTUs count file" />
-    <data format="tabular" name="table_summary_file"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} table summary file" />
-    <data format="fasta" name="dereplicated_nonchimera_otus_fasta"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} multiplexed linearized dereplicated mc2 repset nonchimeras OTUs FASTA" />
-    <data format="html" name="fastqc_quality_boxplots_html"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} FastQC per-base quality boxplots HTML" />
-    <data format="pdf" name="heatmap_otu_table_pdf"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} heatmap OTU table PDF" />
-    <data format="html" name="beta_div_even_weighted_2d_plots"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity weighted 2D plots HTML" />
-    <data format="html" name="beta_div_even_unweighted_2d_plots"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity unweighted 2D plots HTML" />
-    <data format="html" name="alpha_div_rarefaction_plots"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity rarefaction plots HTML" />
-    <data format="html" name="dada2_error_rate_plots"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} DADA2 error rate plots">
-      <filter>pipeline['pipeline_name'] == 'DADA2'</filter>
-    </data>
-    <data format="html" name="alpha_div_boxplots"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity boxplots">
-      <filter>categories_file_in is not None</filter>
-    </data>
-    <data format="html" name="log_files"
-	  label="${tool.name} (${pipeline.pipeline_name}):${title} log files" />
-  </outputs>
-  <tests>
-  </tests>
-  <help><![CDATA[
-
-What it does
-------------
-
-This pipeline has been designed for the analysis of 16S rRNA data from
-Illumina Miseq (Casava >= 1.8) paired-end reads.
-
-Usage
------
-
-1. Preparation of the mapping file and format of unique sample id
-*****************************************************************
-
-Before using the amplicon analysis pipeline it would be necessary to
-follow the steps as below to avoid analysis failures and ensure samples
-are labelled appropriately. Sample names for the labelling are derived
-from the fastq files names that are generated from the sequencing. The
-labels will include everything between the beginning of the name and
-the sample number (from C11 to S19 in Fig. 1)
-
-.. image:: Pipeline_description_Fig1.png
-   :height: 46
-   :width: 382
-
-**Figure 1**
-
-If analysing 16S data from multiple runs:
-
-The samples from different runs may have identical IDs. For example,
-when sequencing the same samples twice, by chance, these could be at
-the same position in both the runs. This would cause the fastq files
-to have exactly the same IDs (Fig. 2).
-
-.. image:: Pipeline_description_Fig2.png
-   :height: 100
-   :width: 463
-
-**Figure 2**
-
-In case of identical sample IDs the pipeline will fail to run and
-generate an error at the beginning of the analysis.
-
-To avoid having to change the file names, before uploading the files,
-ensure that the samples IDs are not repeated.
-
-2. To upload the file
-*********************
-
-Click on **Get Data/Upload File** from the Galaxy tool panel on the
-left hand side.
-
-From the pop-up window, choose how to upload the file. The
-**Choose local file** option can be used for files up to 4Gb. Fastq files
-from Illumina MiSeq will rarely be bigger than 4Gb and this option is
-recommended.
-
-After choosing the files click **Start** to begin the upload. The window can
-now be closed and the files will be uploaded onto the Galaxy server. You
-will see the progress on the ``HISTORY`` panel on the right
-side of the screen. The colour will change from grey (queuing), to yellow
-(uploading) and finally green (uploaded).
-
-Once all the files are uploaded, click on the operations on multiple
-datasets icon and select the fastq files that need to be analysed.
-Click on the tab **For all selected...** and on the option
-**Build List of Dataset pairs** (Fig. 3).
-
-.. image:: Pipeline_description_Fig3.png
-   :height: 247
-   :width: 586
-
-**Figure 3**
-
-Change the filter parameter ``_1`` and ``_2`` to be ``_R1`` and ``_R2``.
-The fastq files forward R1 and reverse R2 should now appear in the
-corresponding columns.
-
-Select **Autopair**. This creates a collection of paired fastq files for
-the forward and reverse reads for each sample. The name of the pairs will
-be the ones used by the pipeline. You are free to change the names at this
-point as long as they are the same used in the Metatable file
-(see section 3).
-
-Name the collection and click on **create list**. This reduces the time
-required to input the forward and reverse reads for each individual sample.
-
-3. Create the Metatable files
-*****************************
-
-Metatable.txt
-~~~~~~~~~~~~~
-
-Click on the list of pairs you just created to see the name of the single
-pairs. The name of the pairs will be the ones used by the pipeline,
-therefore, these are the names that need to be used in the Metatable file.
-
-The Metatable file has to be in QIIME format. You can find a description
-of it on QIIME website http://qiime.org/documentation/file_formats.html
-
-EXAMPLE::
-
-    #SampleID    BarcodeSequence    LinkerPrimerSequence    Disease    Gender    Description
-    Mock-RUN1    TAAGGCGAGCGTAAGA                           PsA        Male      Control
-    Mock-RUN2    CGTACTAGGCGTAAGA                           PsA        Male      Control
-    Mock-RUN3    AGGCAGAAGCGTAAGA                           PsC        Female    Control
-
-Briefly: the column ``LinkerPrimerSequence`` is empty but it cannot be
-deleted. The header is very important. ``#SampleID``, ``Barcode``,
-``LinkerPrimerSequence`` and ``Description`` are mandatory. Between
-``LinkerPrimerSequence`` and ``Description`` you can add as many columns
-as you want. For every column a PCoA plot will be created (see
-**Results** section). You can create this file in Excel and it will have
-to be saved as ``Text(Tab delimited)``.
-
-During the analysis the Metatable.txt will be checked to ensure that the
-file has the correct format. If necessary, this will be modified and will
-be available as Metatable_corrected.txt in the history panel. If you are
-going to use the metatable file for any other statistical analyses,
-remember to use the ``Metatable_mod.txt`` one, otherwise the sample
-names might not match!
-
-Categories.txt (optional)
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This file is required if you want to get box plots for comparison of
-alpha diversity indices (see **Results** section). The file is a list
-(without header and IN ONE COLUMN) of categories present in the
-Metatable.txt file. THE NAMES YOU ARE USING HAVE TO BE THE SAME AS THE
-ONES USED IN THE METATABLE.TXT. You can create this file in Excel and
-will have to be saved as ``Text(Tab delimited)``.
-
-EXAMPLE::
-
-    Disease
-    Gender
-
-Metatable and categories files can be uploaded using Get Data as done
-with the fatsq files.
-
-4. Analysis
-***********
-
-Under **Amplicon_Analysis_Pipeline**
-
- * **Title** Name to distinguish between the runs. It will be shown at
-   the beginning of each output file name.
-
- * **Input Metatable.txt file** Select the Metatable.txt file related to
-   this analysis
-
- * **Input Categories.txt file (Optional)** Select the Categories.txt file
-   related to this analysis
-
- * **Input FASTQ type** select *Dataset pairs in a collection* and, then,
-   the collection of pairs you created earlier.
-
- * **Forward/Reverse PCR primer sequence** if the PCR primer sequences
-   have not been removed from the MiSeq during the fastq creation, they
-   have to be removed before the analysis. Insert the PCR primer sequence
-   in the corresponding field. DO NOT include any barcode or adapter
-   sequence. If the PCR primers have been already trimmed by the MiSeq,
-   and you include the sequence in this field, this would lead to an error.
-   Only include the sequences if still present in the fastq files.
-
- * **Threshold quality below which reads will be trimmed** Choose the
-   Phred score used by Sickle to trim the reads at the 3’ end.
-
- * **Minimum length to retain a read after trimming** If the read length
-   after trimming is shorter than a user defined length, the read, along
-   with the corresponding read pair, will be discarded.
-
- * **Minimum overlap in bp between forward and reverse reads** Choose the
-   minimum basepair overlap used by Pandaseq to assemble the reads.
-   Default is 10.
-
- * **Minimum length in bp to keep a sequence after overlapping** Choose the
-   minimum sequence length used by Pandaseq to keep a sequence after the
-   overlapping. This depends on the expected amplicon length. Default is
-   380 (used for V3-V4 16S sequencing; expected length ~440bp)
-
- * **Pipeline to use for analysis** Choose the pipeline to use for OTU
-   clustering and chimera removal. The Galaxy tool supports the ``Vsearch``
-   and ``DADA2`` pipelines.
-
- * **Reference database** Choose between ``GreenGenes``, ``Silva`` or
-   ``HOMD`` (Human Oral Microbiome Database) for taxa assignment.
-
-Click on **Execute** to start the analysis.
-
-5. Results
-**********
-
-Results are entirely generated using QIIME scripts. The results will
-appear in the History panel when the analysis is completed.
-
-The following outputs are captured:
-
- * **Vsearch_tax_OTU_table.biom|DADA2_tax_OTU_table.biom (biom format)**
-   The OTU table in BIOM format (http://biom-format.org/)
-
- * **otus.tre** Phylogenetic tree constructed using ``make_phylogeny.py``
-   (fasttree) QIIME script (http://qiime.org/scripts/make_phylogeny.html)
-
- * **Phylum_genus_dist_barcharts_HTML** HTML file with bar charts at
-   Phylum, Genus and Species level
-   (http://qiime.org/scripts/summarize_taxa.html and
-   http://qiime.org/scripts/plot_taxa_summary.html)
-
- * **OTUs_count_file** Summary of OTU counts per sample
-   (http://biom-format.org/documentation/summarizing_biom_tables.html)
-
- * **Table_summary_file** Summary of sequences counts per sample
-   (http://biom-format.org/documentation/summarizing_biom_tables.html)
-
- * **multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta|seqs.fa**
-   Fasta file with OTU sequences (Vsearch|DADA2)
-
- * **Heatmap_PDF** OTU heatmap in PDF format
-   (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html )
-
- * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML
-   format using weighted Unifrac distance measure. Samples are grouped
-   by the column names present in the Metatable file. The samples are
-   firstly rarefied to the minimum sequencing depth
-   (http://qiime.org/scripts/beta_diversity_through_plots.html )
-
- * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML
-   format using Unweighted Unifrac distance measure. Samples are grouped
-   by the column names present in the Metatable file. The samples are
-   firstly rarefied to the minimum sequencing depth
-   (http://qiime.org/scripts/beta_diversity_through_plots.html )
-
-Code availability
------------------
-
-**Code is available at** https://github.com/MTutino/Amplicon_analysis
-
-Credits
--------
-
-Pipeline author: Mauro Tutino
-
-Galaxy tool: Peter Briggs
-
-	]]></help>
-  <citations>
-    <citation type="bibtex">
-      @misc{githubAmplicon_analysis,
-      author = {Tutino, Mauro},
-      year = {2017},
-      title = {Amplicon Analysis Pipeline},
-      publisher = {GitHub},
-      journal = {GitHub repository},
-      url = {https://github.com/MTutino/Amplicon_analysis},
-}</citation>
-  </citations>
-</tool>
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,394 +0,0 @@
-#!/bin/sh -e
-#
-# Prototype script to setup a conda environment with the
-# dependencies needed for the Amplicon_analysis_pipeline
-# script
-#
-# Handle command line
-usage()
-{
-    echo "Usage: $(basename $0) [DIR]"
-    echo ""
-    echo "Installs the Amplicon_analysis_pipeline package plus"
-    echo "dependencies in directory DIR (or current directory "
-    echo "if DIR not supplied)"
-}
-if [ ! -z "$1" ] ; then
-    # Check if help was requested
-    case "$1" in
-	--help|-h)
-	    usage
-	    exit 0
-	    ;;
-    esac
-    # Assume it's the installation directory
-    cd $1
-fi
-# Versions
-PIPELINE_VERSION=1.3.5
-CONDA_REQUIRED_VERSION=4.6.14
-RDP_CLASSIFIER_VERSION=2.2
-# Directories
-TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION}
-BIN_DIR=${TOP_DIR}/bin
-CONDA_DIR=${TOP_DIR}/conda
-CONDA_BIN=${CONDA_DIR}/bin
-CONDA_LIB=${CONDA_DIR}/lib
-CONDA=${CONDA_BIN}/conda
-ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}"
-ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME
-#
-# Functions
-#
-# Report failure and terminate script
-fail()
-{
-    echo ""
-    echo ERROR $@ >&2
-    echo ""
-    echo "$(basename $0): installation failed"
-    exit 1
-}
-#
-# Rewrite the shebangs in the installed conda scripts
-# to remove the full path to conda 'bin' directory
-rewrite_conda_shebangs()
-{
-    pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g"
-    find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \;
-}
-#
-# Reset conda version if required
-reset_conda_version()
-{
-    CONDA_VERSION="$(${CONDA_BIN}/conda -V 2>&1 | head -n 1 | cut -d' ' -f2)"
-    echo conda version: ${CONDA_VERSION}
-    if [ "${CONDA_VERSION}" != "${CONDA_REQUIRED_VERSION}" ] ; then
-	echo "Resetting conda to last known working version $CONDA_REQUIRED_VERSION"
-	${CONDA_BIN}/conda config --set allow_conda_downgrades true
-	${CONDA_BIN}/conda install -y conda=${CONDA_REQUIRED_VERSION}
-    else
-	echo "conda version ok"
-    fi
-}
-#
-# Install conda
-install_conda()
-{
-    echo "++++++++++++++++"
-    echo "Installing conda"
-    echo "++++++++++++++++"
-    if [ -e ${CONDA_DIR} ] ; then
-	echo "*** $CONDA_DIR already exists ***" >&2
-	return
-    fi
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
-    bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR}
-    echo Installed conda in ${CONDA_DIR}
-    # Reset the conda version to a known working version
-    # (to avoid problems observed with e.g. conda 4.7.10)
-    echo ""
-    reset_conda_version
-    # Update the installation files
-    # This is to avoid problems when the length the installation
-    # directory path exceeds the limit for the shebang statement
-    # in the conda files
-    echo ""
-    echo -n "Rewriting conda shebangs..."
-    rewrite_conda_shebangs
-    echo "ok"
-    echo -n "Adding conda bin to PATH..."
-    PATH=${CONDA_BIN}:$PATH
-    echo "ok"
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# Create conda environment
-install_conda_packages()
-{
-    echo "+++++++++++++++++++++++++"
-    echo "Installing conda packages"
-    echo "+++++++++++++++++++++++++"
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    cat >environment.yml <<EOF
-name: ${ENV_NAME}
-channels:
-  - defaults
-  - conda-forge
-  - bioconda
-dependencies:
-  - python=2.7
-  - cutadapt=1.8
-  - sickle-trim=1.33
-  - bioawk=1.0
-  - pandaseq=2.8.1
-  - spades=3.10.1
-  - fastqc=0.11.3
-  - qiime=1.9.1
-  - blast-legacy=2.2.26
-  - fasta-splitter=0.2.6
-  - rdp_classifier=$RDP_CLASSIFIER_VERSION
-  - vsearch=2.10.4
-  - r=3.5.1
-  - r-tidyverse=1.2.1
-  - bioconductor-dada2=1.8
-  - bioconductor-biomformat=1.8.0
-EOF
-    ${CONDA} env create --name "${ENV_NAME}" -f environment.yml
-    echo Created conda environment in ${ENV_DIR}
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-    #
-    # Patch qiime 1.9.1 tools to switch deprecated 'axisbg'
-    # matplotlib property to 'facecolor':
-    # https://matplotlib.org/api/prev_api_changes/api_changes_2.0.0.html
-    echo ""
-    for exe in make_2d_plots.py plot_taxa_summary.py ; do
-	echo -n "Patching ${exe}..."
-	find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/axisbg=/facecolor=/g' {} \;
-	echo "done"
-    done
-    #
-    # Patch qiime 1.9.1 tools to switch deprecated 'set_axis_bgcolor'
-    # method call to 'set_facecolor':
-    # https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_axis_bgcolor.html
-    for exe in make_rarefaction_plots.py ; do
-	echo -n "Patching ${exe}..."
-	find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/set_axis_bgcolor/set_facecolor/g' {} \;
-	echo "done"
-    done
-}
-#
-# Install all the non-conda dependencies in a single
-# function (invokes separate functions for each package)
-install_non_conda_packages()
-{
-    echo "+++++++++++++++++++++++++++++"
-    echo "Installing non-conda packages"
-    echo "+++++++++++++++++++++++++++++"
-    # Temporary working directory
-    local wd=$(mktemp -d)
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    # Amplicon analysis pipeline
-    echo -n "Installing Amplicon_analysis_pipeline..."
-    if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then
-	echo "already installed"
-    else
-	install_amplicon_analysis_pipeline
-	echo "ok"
-    fi
-    # ChimeraSlayer
-    echo -n "Installing ChimeraSlayer..."
-    if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then
-	echo "already installed"
-    else
-	install_chimeraslayer
-	echo "ok"
-    fi
-    # Uclust
-    # This no longer seems to be available for download from
-    # drive5.com so don't download
-    echo "WARNING uclust not available: skipping installation"
-}
-#
-# Amplicon analyis pipeline
-install_amplicon_analysis_pipeline()
-{
-    local wd=$(mktemp -d)
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://github.com/MTutino/Amplicon_analysis/archive/${PIPELINE_VERSION}.tar.gz
-    tar zxf ${PIPELINE_VERSION}.tar.gz
-    cd Amplicon_analysis-${PIPELINE_VERSION}
-    INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION}
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline
-    for f in *.sh *.R ; do
-	/bin/cp $f $INSTALL_DIR
-    done
-    /bin/cp -r uc2otutab $INSTALL_DIR
-    mkdir -p ${BIN_DIR}
-    cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF
-#!/usr/bin/env bash
-#
-# Point to Qiime config
-export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config
-# Set up the RDP jar file
-export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
-# Set the Matplotlib backend
-export MPLBACKEND="agg"
-# Put the scripts onto the PATH
-export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH
-# Activate the conda environment
-export PATH=${CONDA_BIN}:\$PATH
-source ${CONDA_BIN}/activate ${ENV_NAME}
-# Execute the driver script with the supplied arguments
-$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@
-exit \$?
-EOF
-    chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh
-    cat >${BIN_DIR}/install_reference_data.sh <<EOF
-#!/usr/bin/env bash -e
-#
-function usage() {
-  echo "Usage: \$(basename \$0) DIR"
-}
-if [ -z "\$1" ] ; then
-  usage
-  exit 0
-elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then
-  usage
-  echo ""
-  echo "Install reference data into DIR"
-  exit 0
-fi
-echo "=========================================="
-echo "Installing Amplicon analysis pipeline data"
-echo "=========================================="
-if [ ! -e "\$1" ] ; then
-    echo "Making directory \$1"
-    mkdir -p \$1
-fi
-cd \$1
-DATA_DIR=\$(pwd)
-echo "Installing reference data under \$DATA_DIR"
-$INSTALL_DIR/References.sh
-echo ""
-echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh"
-echo "to use the reference data from this directory"
-echo ""
-echo "\$(basename \$0): finished"
-EOF
-    chmod 0755 ${BIN_DIR}/install_reference_data.sh
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# ChimeraSlayer
-install_chimeraslayer()
-{
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz
-    tar zxf microbiomeutil_2010-04-29.tar.gz
-    cd microbiomeutil_2010-04-29
-    INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer
-    /bin/cp -r ChimeraSlayer $INSTALL_DIR
-    cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF
-#!/usr/bin/env bash
-export PATH=$INSTALL_DIR:\$PATH
-$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@
-EOF
-    chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl
-    chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# uclust required for QIIME/pyNAST
-# License only allows this version to be used with those two packages
-# See: http://drive5.com/uclust/downloads1_2_22q.html
-install_uclust()
-{
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64
-    INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust
-    /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust
-    chmod 0755 ${INSTALL_DIR}/uclust
-    ln -s  ${INSTALL_DIR}/uclust ${BIN_DIR}
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-setup_pipeline_environment()
-{
-    echo "+++++++++++++++++++++++++++++++"
-    echo "Setting up pipeline environment"
-    echo "+++++++++++++++++++++++++++++++"
-    # fasta_splitter.pl
-    echo -n "Setting up fasta_splitter.pl..."
-    if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then
-	echo "already exists"
-    elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then
-	echo "failed"
-	fail "fasta-splitter.pl not found"
-    else
-	ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl
-	echo "ok"
-    fi
-    # rdp_classifier.jar
-    local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
-    echo -n "Setting up rdp_classifier.jar..."
-    if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then
-	echo "already exists"
-    elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then
-	echo "failed"
-	fail "rdp_classifier.jar not found"
-    else
-	mkdir -p ${TOP_DIR}/share/rdp_classifier
-	ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar}
-	echo "ok"
-    fi
-    # qiime_config
-    echo -n "Setting up qiime_config..."
-    if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then
-	echo "already exists"
-    else
-	mkdir -p ${TOP_DIR}/qiime
-	cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config
-qiime_scripts_dir	${ENV_DIR}/bin
-EOF-qiime-config
-	echo "ok"
-    fi
-}
-#
-# Top level script does the installation
-echo "======================================="
-echo "Amplicon_analysis_pipeline installation"
-echo "======================================="
-echo "Installing into ${TOP_DIR}"
-if [ -e ${TOP_DIR} ] ; then
-    fail "Directory already exists"
-fi
-mkdir -p ${TOP_DIR}
-install_conda
-install_conda_packages
-install_non_conda_packages
-setup_pipeline_environment
-echo "===================================="
-echo "Amplicon_analysis_pipeline installed"
-echo "===================================="
-echo ""
-echo "Install reference data using:"
-echo ""
-echo "\$ ${BIN_DIR}/install_reference_data.sh DIR"
-echo ""
-echo "Run pipeline scripts using:"
-echo ""
-echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..."
-echo ""
-echo "(or add ${BIN_DIR} to your PATH)"
-echo ""
-echo "$(basename $0): finished"
-##
-#
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis.sh	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,425 +0,0 @@
-#!/bin/sh -e
-#
-# Prototype script to setup a conda environment with the
-# dependencies needed for the Amplicon_analysis_pipeline
-# script
-#
-# Handle command line
-usage()
-{
-    echo "Usage: $(basename $0) [DIR]"
-    echo ""
-    echo "Installs the Amplicon_analysis_pipeline package plus"
-    echo "dependencies in directory DIR (or current directory "
-    echo "if DIR not supplied)"
-}
-if [ ! -z "$1" ] ; then
-    # Check if help was requested
-    case "$1" in
-	--help|-h)
-	    usage
-	    exit 0
-	    ;;
-    esac
-    # Assume it's the installation directory
-    cd $1
-fi
-# Versions
-PIPELINE_VERSION=1.2.3
-RDP_CLASSIFIER_VERSION=2.2
-# Directories
-TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION}
-BIN_DIR=${TOP_DIR}/bin
-CONDA_DIR=${TOP_DIR}/conda
-CONDA_BIN=${CONDA_DIR}/bin
-CONDA_LIB=${CONDA_DIR}/lib
-CONDA=${CONDA_BIN}/conda
-ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}"
-ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME
-#
-# Functions
-#
-# Report failure and terminate script
-fail()
-{
-    echo ""
-    echo ERROR $@ >&2
-    echo ""
-    echo "$(basename $0): installation failed"
-    exit 1
-}
-#
-# Rewrite the shebangs in the installed conda scripts
-# to remove the full path to conda 'bin' directory
-rewrite_conda_shebangs()
-{
-    pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g"
-    find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \;
-}
-#
-# Install conda
-install_conda()
-{
-    echo "++++++++++++++++"
-    echo "Installing conda"
-    echo "++++++++++++++++"
-    if [ -e ${CONDA_DIR} ] ; then
-	echo "*** $CONDA_DIR already exists ***" >&2
-	return
-    fi
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
-    bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR}
-    echo Installed conda in ${CONDA_DIR}
-    # Update the installation files
-    # This is to avoid problems when the length the installation
-    # directory path exceeds the limit for the shebang statement
-    # in the conda files
-    echo ""
-    echo -n "Rewriting conda shebangs..."
-    rewrite_conda_shebangs
-    echo "ok"
-    echo -n "Adding conda bin to PATH..."
-    PATH=${CONDA_BIN}:$PATH
-    echo "ok"
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# Create conda environment
-install_conda_packages()
-{
-    echo "+++++++++++++++++++++++++"
-    echo "Installing conda packages"
-    echo "+++++++++++++++++++++++++"
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    cat >environment.yml <<EOF
-name: ${ENV_NAME}
-channels:
-  - defaults
-  - conda-forge
-  - bioconda
-dependencies:
-  - python=2.7
-  - cutadapt=1.11
-  - sickle-trim=1.33
-  - bioawk=1.0
-  - pandaseq=2.8.1
-  - spades=3.5.0
-  - fastqc=0.11.3
-  - qiime=1.8.0
-  - blast-legacy=2.2.26
-  - fasta-splitter=0.2.4
-  - rdp_classifier=$RDP_CLASSIFIER_VERSION
-  - vsearch=1.1.3
-  # Need to explicitly specify libgfortran
-  # version (otherwise get version incompatible
-  # with numpy=1.7.1)
-  - libgfortran=1.0
-  # Compilers needed to build R
-  - gcc_linux-64
-  - gxx_linux-64
-  - gfortran_linux-64
-EOF
-    ${CONDA} env create --name "${ENV_NAME}" -f environment.yml
-    echo Created conda environment in ${ENV_DIR}
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# Install all the non-conda dependencies in a single
-# function (invokes separate functions for each package)
-install_non_conda_packages()
-{
-    echo "+++++++++++++++++++++++++++++"
-    echo "Installing non-conda packages"
-    echo "+++++++++++++++++++++++++++++"
-    # Temporary working directory
-    local wd=$(mktemp -d)
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    # Amplicon analysis pipeline
-    echo -n "Installing Amplicon_analysis_pipeline..."
-    if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then
-	echo "already installed"
-    else
-	install_amplicon_analysis_pipeline
-	echo "ok"
-    fi
-    # ChimeraSlayer
-    echo -n "Installing ChimeraSlayer..."
-    if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then
-	echo "already installed"
-    else
-	install_chimeraslayer
-	echo "ok"
-    fi
-    # Uclust
-    echo -n "Installing uclust for QIIME/pyNAST..."
-    if [ -e ${BIN_DIR}/uclust ] ; then
-	echo "already installed"
-    else
-	install_uclust
-	echo "ok"
-    fi
-    # R 3.2.1"
-    echo -n "Checking for R 3.2.1..."
-    if [ -e ${BIN_DIR}/R ] ; then
-	echo "R already installed"
-    else
-	echo "not found"
-	install_R_3_2_1
-    fi
-}
-#
-# Amplicon analyis pipeline
-install_amplicon_analysis_pipeline()
-{
-    local wd=$(mktemp -d)
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${PIPELINE_VERSION}.tar.gz
-    tar zxf v${PIPELINE_VERSION}.tar.gz
-    cd Amplicon_analysis-${PIPELINE_VERSION}
-    INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION}
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline
-    for f in *.sh ; do
-	/bin/cp $f $INSTALL_DIR
-    done
-    /bin/cp -r uc2otutab $INSTALL_DIR
-    mkdir -p ${BIN_DIR}
-    cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF
-#!/usr/bin/env bash
-#
-# Point to Qiime config
-export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config
-# Set up the RDP jar file
-export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
-# Put the scripts onto the PATH
-export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH
-# Activate the conda environment
-export PATH=${CONDA_BIN}:\$PATH
-source ${CONDA_BIN}/activate ${ENV_NAME}
-# Execute the driver script with the supplied arguments
-$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@
-exit \$?
-EOF
-    chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh
-    cat >${BIN_DIR}/install_reference_data.sh <<EOF
-#!/usr/bin/env bash -e
-#
-function usage() {
-  echo "Usage: \$(basename \$0) DIR"
-}
-if [ -z "\$1" ] ; then
-  usage
-  exit 0
-elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then
-  usage
-  echo ""
-  echo "Install reference data into DIR"
-  exit 0
-fi
-echo "=========================================="
-echo "Installing Amplicon analysis pipeline data"
-echo "=========================================="
-if [ ! -e "\$1" ] ; then
-    echo "Making directory \$1"
-    mkdir -p \$1
-fi
-cd \$1
-DATA_DIR=\$(pwd)
-echo "Installing reference data under \$DATA_DIR"
-$INSTALL_DIR/References.sh
-echo ""
-echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh"
-echo "to use the reference data from this directory"
-echo ""
-echo "\$(basename \$0): finished"
-EOF
-    chmod 0755 ${BIN_DIR}/install_reference_data.sh
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# ChimeraSlayer
-install_chimeraslayer()
-{
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz
-    tar zxf microbiomeutil_2010-04-29.tar.gz
-    cd microbiomeutil_2010-04-29
-    INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer
-    /bin/cp -r ChimeraSlayer $INSTALL_DIR
-    cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF
-#!/usr/bin/env bash
-export PATH=$INSTALL_DIR:\$PATH
-$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@
-EOF
-    chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl
-    chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# uclust required for QIIME/pyNAST
-# License only allows this version to be used with those two packages
-# See: http://drive5.com/uclust/downloads1_2_22q.html
-install_uclust()
-{
-    local wd=$(mktemp -d)
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64
-    INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22
-    mkdir -p $INSTALL_DIR
-    ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust
-    /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust
-    chmod 0755 ${INSTALL_DIR}/uclust
-    ln -s  ${INSTALL_DIR}/uclust ${BIN_DIR}
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-}
-#
-# R 3.2.1
-# Can't use version from conda due to dependency conflicts
-install_R_3_2_1()
-{
-    . ${CONDA_BIN}/activate ${ENV_NAME}
-    local cwd=$(pwd)
-    local wd=$(mktemp -d)
-    cd $wd
-    echo -n "Fetching R 3.2.1 source code..."
-    wget -q http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz
-    echo "ok"
-    INSTALL_DIR=${TOP_DIR}
-    mkdir -p $INSTALL_DIR
-    echo -n "Unpacking source code..."
-    tar xzf R-3.2.1.tar.gz >INSTALL.log 2>&1
-    echo "ok"
-    cd R-3.2.1
-    echo -n "Running configure..."
-    ./configure --prefix=$INSTALL_DIR --with-x=no --with-readline=no >>INSTALL.log 2>&1
-    echo "ok"
-    echo -n "Running make..."
-    make >>INSTALL.log 2>&1
-    echo "ok"
-    echo -n "Running make install..."
-    make install >>INSTALL.log 2>&1
-    echo "ok"
-    cd $cwd
-    rm -rf $wd/*
-    rmdir $wd
-    . ${CONDA_BIN}/deactivate
-}
-setup_pipeline_environment()
-{
-    echo "+++++++++++++++++++++++++++++++"
-    echo "Setting up pipeline environment"
-    echo "+++++++++++++++++++++++++++++++"
-    # vsearch113
-    echo -n "Setting up vsearch113..."
-    if [ -e ${BIN_DIR}/vsearch113 ] ; then
-	echo "already exists"
-    elif [ ! -e ${ENV_DIR}/bin/vsearch ] ; then
-	echo "failed"
-	fail "vsearch not found"
-    else
-	ln -s ${ENV_DIR}/bin/vsearch ${BIN_DIR}/vsearch113
-	echo "ok"
-    fi
-    # fasta_splitter.pl
-    echo -n "Setting up fasta_splitter.pl..."
-    if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then
-	echo "already exists"
-    elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then
-	echo "failed"
-	fail "fasta-splitter.pl not found"
-    else
-	ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl
-	echo "ok"
-    fi
-    # rdp_classifier.jar
-    local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
-    echo -n "Setting up rdp_classifier.jar..."
-    if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then
-	echo "already exists"
-    elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then
-	echo "failed"
-	fail "rdp_classifier.jar not found"
-    else
-	mkdir -p ${TOP_DIR}/share/rdp_classifier
-	ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar}
-	echo "ok"
-    fi
-    # qiime_config
-    echo -n "Setting up qiime_config..."
-    if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then
-	echo "already exists"
-    else
-	mkdir -p ${TOP_DIR}/qiime
-	cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config
-qiime_scripts_dir	${ENV_DIR}/bin
-EOF-qiime-config
-	echo "ok"
-    fi
-}
-#
-# Remove the compilers from the conda environment
-# Not sure if this step is necessary
-remove_conda_compilers()
-{
-    echo "+++++++++++++++++++++++++++++++++++++++++"
-    echo "Removing compilers from conda environment"
-    echo "+++++++++++++++++++++++++++++++++++++++++"
-    ${CONDA} remove -y -n ${ENV_NAME} gcc_linux-64 gxx_linux-64 gfortran_linux-64
-}
-#
-# Top level script does the installation
-echo "======================================="
-echo "Amplicon_analysis_pipeline installation"
-echo "======================================="
-echo "Installing into ${TOP_DIR}"
-if [ -e ${TOP_DIR} ] ; then
-    fail "Directory already exists"
-fi
-mkdir -p ${TOP_DIR}
-install_conda
-install_conda_packages
-install_non_conda_packages
-setup_pipeline_environment
-remove_conda_compilers
-echo "===================================="
-echo "Amplicon_analysis_pipeline installed"
-echo "===================================="
-echo ""
-echo "Install reference data using:"
-echo ""
-echo "\$ ${BIN_DIR}/install_reference_data.sh DIR"
-echo ""
-echo "Run pipeline scripts using:"
-echo ""
-echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..."
-echo ""
-echo "(or add ${BIN_DIR} to your PATH)"
-echo ""
-echo "$(basename $0): finished"
-##
-#
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig1.png has changed
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig2.png has changed
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig3.png has changed
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/tool_dependencies.xml	Thu Dec 05 11:44:03 2019 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,16 +0,0 @@
-<?xml version="1.0"?>
-<tool_dependency>
-  <package name="amplicon_analysis_pipeline" version="1.3.5">
-    <install version="1.0">
-      <actions>
-	<action type="download_file">https://raw.githubusercontent.com/pjbriggs/Amplicon_analysis-galaxy/update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh</action>
-	<action type="shell_command">
-	  sh ./install_amplicon_analysis.sh $INSTALL_DIR
-	</action>
-	<action type="set_environment">
-	    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/Amplicon_analysis-1.3.5/bin</environment_variable>
-	</action>
-      </actions>
-    </install>
-  </package>
-</tool_dependency>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,213 @@
+Amplicon_analysis-galaxy
+========================
+
+A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
+script at https://github.com/MTutino/Amplicon_analysis
+
+The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
+(Casava >= 1.8) and performs the following operations:
+
+ * QC and clean up of input data
+ * Removal of singletons and chimeras and building of OTU table
+   and phylogenetic tree
+ * Beta and alpha diversity of analysis
+
+Usage documentation
+===================
+
+Usage of the tool (including required inputs) is documented within
+the ``help`` section of the tool XML.
+
+Installing the tool in a Galaxy instance
+========================================
+
+The following sections describe how to install the tool files,
+dependencies and reference data, and how to configure the Galaxy
+instance to detect the dependencies and reference data correctly
+at run time.
+
+1. Install the tool from the toolshed
+-------------------------------------
+
+The core tool is hosted on the Galaxy toolshed, so it can be installed
+directly from there (this is the recommended route):
+
+ * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
+
+Alternatively it can be installed manually; in this case there are two
+files to install:
+
+ * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
+ * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
+
+Put these in a directory that is visible to Galaxy (e.g. a
+``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
+file to tell Galaxy to offer the tool by adding the line e.g.::
+
+    <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
+
+2. Install the reference data
+-----------------------------
+
+The script ``References.sh`` from the pipeline package at
+https://github.com/MTutino/Amplicon_analysis can be run to install
+the reference data, for example::
+
+    cd /path/to/pipeline/data
+    wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
+    /bin/bash ./References.sh
+
+will install the data in ``/path/to/pipeline/data``.
+
+**NB** The final amount of data downloaded and uncompressed will be
+around 9GB.
+
+3. Configure reference data location in Galaxy
+----------------------------------------------
+
+The final step is to make your Galaxy installation aware of the
+location of the reference data, so it can locate them both when the
+tool is run.
+
+The tool locates the reference data via an environment variable called
+``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
+directory where the reference data has been installed.
+
+There are various ways to do this, depending on how your Galaxy
+installation is configured:
+
+ * **For local instances:** add a line to set it in the
+   ``config/local_env.sh`` file of your Galaxy installation (you
+   may need to create a new empty file first), e.g.::
+
+       export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
+
+ * **For production instances:** set the value in the ``job_conf.xml``
+   configuration file, e.g.::
+
+       <destination id="amplicon_analysis">
+          <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
+       </destination>
+
+   and then specify that the pipeline tool uses this destination::
+
+       <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
+
+   (For more about job destinations see the Galaxy documentation at
+   https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
+
+4. Enable rendering of HTML outputs from pipeline
+-------------------------------------------------
+
+To ensure that HTML outputs are displayed correctly in Galaxy
+(for example the Vsearch OTU table heatmaps), Galaxy needs to be
+configured not to sanitize the outputs from the ``Amplicon_analysis``
+tool.
+
+Either:
+
+ * **For local instances:** set ``sanitize_all_html = False`` in
+   ``config/galaxy.ini`` (nb don't do this on production servers or
+   public instances!); or
+
+ * **For production instances:** add the ``Amplicon_analysis`` tool
+   to the display whitelist in the Galaxy instance:
+
+   - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
+     ``config/galaxy.ini`` and restart Galaxy;
+   - Go to ``Admin>Manage Display Whitelist``, check the box for
+     ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
+     search function to help locate it) and click on
+     ``Submit new whitelist`` to update the settings.
+
+Additional details
+==================
+
+Some other things to be aware of:
+
+ * Note that using the Silva database requires a minimum of 18Gb RAM
+
+Known problems
+==============
+
+ * Only the ``VSEARCH`` pipeline in Mauro's script is currently
+   available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
+   pipelines have yet to be implemented.
+ * The images in the tool help section are not visible if the
+   tool has been installed locally, or if it has been installed in
+   a Galaxy instance which is served from a subdirectory.
+
+   These are both problems with Galaxy and not the tool, see
+   https://github.com/galaxyproject/galaxy/issues/4490 and
+   https://github.com/galaxyproject/galaxy/issues/1676
+
+Appendix: installing the dependencies manually
+==============================================
+
+If the tool is installed from the Galaxy toolshed (recommended) then
+the dependencies should be installed automatically and this step can
+be skipped.
+
+Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used
+to fetch and install the dependencies locally, for example::
+
+    install_amplicon_analysis.sh /path/to/local_tool_dependencies
+
+(This is the same script as is used to install dependencies from the
+toolshed.) This can take some time to complete, and when completed will
+have created a directory called ``Amplicon_analysis-1.2.3`` containing
+the dependencies under the specified top level directory.
+
+**NB** The installed dependencies will occupy around 2.6G of disk
+space.
+
+You will need to make sure that the ``bin`` subdirectory of this
+directory is on Galaxy's ``PATH`` at runtime, for the tool to be able
+to access the dependencies - for example by adding a line to the
+``local_env.sh`` file like::
+
+    export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
+
+History
+=======
+
+========== ======================================================================
+Version    Changes
+---------- ----------------------------------------------------------------------
+1.3.5.0    Updated to Amplicon_Analysis_Pipeline version 1.3.5.
+1.2.3.0    Updated to Amplicon_Analysis_Pipeline version 1.2.3; install
+           dependencies via tool_dependencies.xml.
+1.2.2.0    Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes
+           jackknifed analysis which is not captured by Galaxy tool)
+1.2.1.0    Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds
+           option to use the Human Oral Microbiome Database v15.1, and
+           updates SILVA database to v123)
+1.1.0      First official version on Galaxy toolshed.
+1.0.6      Expand inline documentation to provide detailed usage guidance.
+1.0.5      Updates including:
+
+           - Capture read counts from quality control as new output dataset
+           - Capture FastQC per-base quality boxplots for each sample as
+             new output dataset
+           - Add support for -l option (sliding window length for trimming)
+           - Default for -L set to "200"
+1.0.4      Various updates:
+
+	   - Additional outputs are captured when a "Categories" file is
+	     supplied (alpha diversity rarefaction curves and boxplots)
+	   - Sample names derived from Fastqs in a collection of pairs
+	     are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
+           - Input Fastqs can now be of more general ``fastq`` type
+	   - Log file outputs are captured in new output dataset
+	   - User can specify a "title" for the job which is copied into
+	     the dataset names (to distinguish outputs from different runs)
+	   - Improved detection and reporting of problems with input
+	     Metatable
+1.0.3      Take the sample names from the collection dataset names when
+           using collection as input (this is now the default input mode);
+           collect additional output dataset; disable ``usearch``-based
+           pipelines (i.e. ``UPARSE`` and ``QIIME``).
+1.0.2      Enable support for FASTQs supplied via dataset collections and
+           fix some broken output datasets.
+1.0.1      Initial version
+========== ======================================================================
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/amplicon_analysis_pipeline.py	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,370 @@
+#!/usr/bin/env python
+#
+# Wrapper script to run Amplicon_analysis_pipeline.sh
+# from Galaxy tool
+
+import sys
+import os
+import argparse
+import subprocess
+import glob
+
+class PipelineCmd(object):
+    def __init__(self,cmd):
+        self.cmd = [str(cmd)]
+    def add_args(self,*args):
+        for arg in args:
+            self.cmd.append(str(arg))
+    def __repr__(self):
+        return ' '.join([str(arg) for arg in self.cmd])
+
+def ahref(target,name=None,type=None):
+    if name is None:
+        name = os.path.basename(target)
+    ahref = "<a href='%s'" % target
+    if type is not None:
+        ahref += " type='%s'" % type
+    ahref += ">%s</a>" % name
+    return ahref
+
+def check_errors():
+    # Errors in Amplicon_analysis_pipeline.log
+    with open('Amplicon_analysis_pipeline.log','r') as pipeline_log:
+        log = pipeline_log.read()
+        if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log:
+            print_error("""*** Sample IDs don't match dataset names ***
+
+The sample IDs (first column of the Metatable file) don't match the
+supplied sample names for the input Fastq pairs.
+""")
+    # Errors in pipeline output
+    with open('pipeline.log','r') as pipeline_log:
+        log = pipeline_log.read()
+        if "Errors and/or warnings detected in mapping file" in log:
+            with open("Metatable_log/Metatable.log","r") as metatable_log:
+                # Echo the Metatable log file to the tool log
+                print_error("""*** Error in Metatable mapping file ***
+
+%s""" % metatable_log.read())
+        elif "No header line was found in mapping file" in log:
+            # Report error to the tool log
+            print_error("""*** No header in Metatable mapping file ***
+
+Check you've specified the correct file as the input Metatable""")
+
+def print_error(message):
+    width = max([len(line) for line in message.split('\n')]) + 4
+    sys.stderr.write("\n%s\n" % ('*'*width))
+    for line in message.split('\n'):
+        sys.stderr.write("* %s%s *\n" % (line,' '*(width-len(line)-4)))
+    sys.stderr.write("%s\n\n" % ('*'*width))
+
+def clean_up_name(sample):
+    # Remove extensions and trailing "_L[0-9]+_001" from
+    # Fastq pair names
+    sample_name = '.'.join(sample.split('.')[:1])
+    split_name = sample_name.split('_')
+    if split_name[-1] == "001":
+        split_name = split_name[:-1]
+    if split_name[-1].startswith('L'):
+        try:
+            int(split_name[-1][1:])
+            split_name = split_name[:-1]
+        except ValueError:
+            pass
+    return '_'.join(split_name)
+
+def list_outputs(filen=None):
+    # List the output directory contents
+    # If filen is specified then will be the filename to
+    # write to, otherwise write to stdout
+    if filen is not None:
+        fp = open(filen,'w')
+    else:
+        fp = sys.stdout
+    results_dir = os.path.abspath("RESULTS")
+    fp.write("Listing contents of output dir %s:\n" % results_dir)
+    ix = 0
+    for d,dirs,files in os.walk(results_dir):
+        ix += 1
+        fp.write("-- %d: %s\n" % (ix,
+                                  os.path.relpath(d,results_dir)))
+        for f in files:
+            ix += 1
+            fp.write("---- %d: %s\n" % (ix,
+                                        os.path.relpath(f,results_dir)))
+    # Close output file
+    if filen is not None:
+        fp.close()
+
+if __name__ == "__main__":
+    # Command line
+    print "Amplicon analysis: starting"
+    p = argparse.ArgumentParser()
+    p.add_argument("metatable",
+                   metavar="METATABLE_FILE",
+                   help="Metatable.txt file")
+    p.add_argument("fastq_pairs",
+                   metavar="SAMPLE_NAME FQ_R1 FQ_R2",
+                   nargs="+",
+                   default=list(),
+                   help="Triplets of SAMPLE_NAME followed by "
+                   "a R1/R2 FASTQ file pair")
+    p.add_argument("-g",dest="forward_pcr_primer")
+    p.add_argument("-G",dest="reverse_pcr_primer")
+    p.add_argument("-q",dest="trimming_threshold")
+    p.add_argument("-O",dest="minimum_overlap")
+    p.add_argument("-L",dest="minimum_length")
+    p.add_argument("-l",dest="sliding_window_length")
+    p.add_argument("-P",dest="pipeline",
+                   choices=["Vsearch","DADA2"],
+                   type=str,
+                   default="Vsearch")
+    p.add_argument("-S",dest="use_silva",action="store_true")
+    p.add_argument("-H",dest="use_homd",action="store_true")
+    p.add_argument("-r",dest="reference_data_path")
+    p.add_argument("-c",dest="categories_file")
+    args = p.parse_args()
+
+    # Build the environment for running the pipeline
+    print "Amplicon analysis: building the environment"
+    metatable_file = os.path.abspath(args.metatable)
+    os.symlink(metatable_file,"Metatable.txt")
+    print "-- made symlink to Metatable.txt"
+
+    # Link to Categories.txt file (if provided)
+    if args.categories_file is not None:
+        categories_file = os.path.abspath(args.categories_file)
+        os.symlink(categories_file,"Categories.txt")
+        print "-- made symlink to Categories.txt"
+
+    # Link to FASTQs and construct Final_name.txt file
+    sample_names = []
+    print "-- making Final_name.txt"
+    with open("Final_name.txt",'w') as final_name:
+        fastqs = iter(args.fastq_pairs)
+        for sample_name,fqr1,fqr2 in zip(fastqs,fastqs,fastqs):
+            sample_name = clean_up_name(sample_name)
+            print "   %s" % sample_name
+            r1 = "%s_R1_.fastq" % sample_name
+            r2 = "%s_R2_.fastq" % sample_name
+            os.symlink(fqr1,r1)
+            os.symlink(fqr2,r2)
+            final_name.write("%s\n" % '\t'.join((r1,sample_name)))
+            final_name.write("%s\n" % '\t'.join((r2,sample_name)))
+            sample_names.append(sample_name)
+
+    # Reference database
+    if args.use_silva:
+        ref_database = "silva"
+    elif args.use_homd:
+        ref_database = "homd"
+    else:
+        ref_database = "gg"
+
+    # Construct the pipeline command
+    print "Amplicon analysis: constructing pipeline command"
+    pipeline = PipelineCmd("Amplicon_analysis_pipeline.sh")
+    if args.forward_pcr_primer:
+        pipeline.add_args("-g",args.forward_pcr_primer)
+    if args.reverse_pcr_primer:
+        pipeline.add_args("-G",args.reverse_pcr_primer)
+    if args.trimming_threshold:
+        pipeline.add_args("-q",args.trimming_threshold)
+    if args.minimum_overlap:
+        pipeline.add_args("-O",args.minimum_overlap)
+    if args.minimum_length:
+        pipeline.add_args("-L",args.minimum_length)
+    if args.sliding_window_length:
+        pipeline.add_args("-l",args.sliding_window_length)
+    if args.reference_data_path:
+        pipeline.add_args("-r",args.reference_data_path)
+    pipeline.add_args("-P",args.pipeline)
+    if ref_database == "silva":
+        pipeline.add_args("-S")
+    elif ref_database == "homd":
+        pipeline.add_args("-H")
+
+    # Echo the pipeline command to stdout
+    print "Running %s" % pipeline
+
+    # Run the pipeline
+    with open("pipeline.log","w") as pipeline_out:
+        try:
+            subprocess.check_call(pipeline.cmd,
+                                  stdout=pipeline_out,
+                                  stderr=subprocess.STDOUT)
+            exit_code = 0
+            print "Pipeline completed ok"
+        except subprocess.CalledProcessError as ex:
+            # Non-zero exit status
+            sys.stderr.write("Pipeline failed: exit code %s\n" %
+                             ex.returncode)
+            exit_code = ex.returncode
+        except Exception as ex:
+            # Some other problem
+            sys.stderr.write("Unexpected error: %s\n" % str(ex))
+            exit_code = 1
+
+    # Write out the list of outputs
+    outputs_file = "Pipeline_outputs.txt"
+    list_outputs(outputs_file)
+
+    # Check for log file
+    log_file = "Amplicon_analysis_pipeline.log"
+    if os.path.exists(log_file):
+        print "Found log file: %s" % log_file
+        if exit_code == 0:
+            # Create an HTML file to link to log files etc
+            # NB the paths to the files should be correct once
+            # copied by Galaxy on job completion
+            with open("pipeline_outputs.html","w") as html_out:
+                html_out.write("""<html>
+<head>
+<title>Amplicon analysis pipeline: log files</title>
+<head>
+<body>
+<h1>Amplicon analysis pipeline: log files</h1>
+<ul>
+""")
+                html_out.write(
+                    "<li>%s</li>\n" %
+                    ahref("Amplicon_analysis_pipeline.log",
+                          type="text/plain"))
+                html_out.write(
+                    "<li>%s</li>\n" %
+                    ahref("pipeline.log",type="text/plain"))
+                html_out.write(
+                    "<li>%s</li>\n" %
+                    ahref("Pipeline_outputs.txt",
+                          type="text/plain"))
+                html_out.write(
+                    "<li>%s</li>\n" %
+                    ahref("Metatable.html"))
+                html_out.write("""<ul>
+</body>
+</html>
+""")
+        else:
+            # Check for known error messages
+            check_errors()
+            # Write pipeline stdout to tool stderr
+            sys.stderr.write("\nOutput from pipeline:\n")
+            with open("pipeline.log",'r') as log:
+                sys.stderr.write("%s" % log.read())
+            # Write log file contents to tool log
+            print "\nAmplicon_analysis_pipeline.log:"
+            with open(log_file,'r') as log:
+                print "%s" % log.read()
+    else:
+        sys.stderr.write("ERROR missing log file \"%s\"\n" %
+                         log_file)
+
+    # Handle FastQC boxplots
+    print "Amplicon analysis: collating per base quality boxplots"
+    with open("fastqc_quality_boxplots.html","w") as quality_boxplots:
+        # PHRED value for trimming
+        phred_score = 20
+        if args.trimming_threshold is not None:
+            phred_score = args.trimming_threshold
+        # Write header for HTML output file
+        quality_boxplots.write("""<html>
+<head>
+<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title>
+<head>
+<body>
+<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1>
+""")
+        # Look for raw and trimmed FastQC output for each sample
+        for sample_name in sample_names:
+            fastqc_dir = os.path.join(sample_name,"FastQC")
+            quality_boxplots.write("<h2>%s</h2>" % sample_name)
+            for d in ("Raw","cutdapt_sickle/Q%s" % phred_score):
+                quality_boxplots.write("<h3>%s</h3>" % d)
+                fastqc_html_files = glob.glob(
+                    os.path.join(fastqc_dir,d,"*_fastqc.html"))
+                if not fastqc_html_files:
+                    quality_boxplots.write("<p>No FastQC outputs found</p>")
+                    continue
+                # Pull out the per-base quality boxplots
+                for f in fastqc_html_files:
+                    boxplot = None
+                    with open(f) as fp:
+                        for line in fp.read().split(">"):
+                            try:
+                                line.index("alt=\"Per base quality graph\"")
+                                boxplot = line + ">"
+                                break
+                            except ValueError:
+                                pass
+                    if boxplot is None:
+                        boxplot = "Missing plot"
+                    quality_boxplots.write("<h4>%s</h4><p>%s</p>" %
+
+                                           (os.path.basename(f),
+                                            boxplot))
+            quality_boxplots.write("""</body>
+</html>
+""")
+
+    # Handle DADA2 error rate plot PDFs
+    if args.pipeline == "DADA2":
+        print("Amplicon analysis: collecting error rate plots")
+        error_rate_plots_dir = os.path.abspath(
+            os.path.join("DADA2_OTU_tables",
+                         "Error_rate_plots"))
+        error_rate_plot_pdfs = [os.path.basename(pdf)
+                                for pdf in
+                                sorted(glob.glob(
+                                    os.path.join(error_rate_plots_dir,"*.pdf")))]
+        with open("error_rate_plots.html","w") as error_rate_plots_out:
+            error_rate_plots_out.write("""<html>
+<head>
+<title>Amplicon analysis pipeline: DADA2 Error Rate Plots</title>
+<head>
+<body>
+<h1>Amplicon analysis pipeline: DADA2 Error Rate Plots</h1>
+""")
+            error_rate_plots_out.write("<ul>\n")
+            for pdf in error_rate_plot_pdfs:
+                error_rate_plots_out.write("<li>%s</li>\n" % ahref(pdf))
+            error_rate_plots_out.write("<ul>\n")
+            error_rate_plots_out.write("""</body>
+</html>
+""")
+
+    # Handle additional output when categories file was supplied
+    if args.categories_file is not None:
+        # Alpha diversity boxplots
+        print "Amplicon analysis: indexing alpha diversity boxplots"
+        boxplots_dir = os.path.abspath(
+            os.path.join("RESULTS",
+                         "%s_%s" % (args.pipeline,
+                                    ref_database),
+                         "Alpha_diversity",
+                         "Alpha_diversity_boxplot",
+                         "Categories_shannon"))
+        print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir
+        boxplot_pdfs = [os.path.basename(pdf)
+                        for pdf in
+                        sorted(glob.glob(
+                            os.path.join(boxplots_dir,"*.pdf")))]
+        with open("alpha_diversity_boxplots.html","w") as boxplots_out:
+            boxplots_out.write("""<html>
+<head>
+<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title>
+<head>
+<body>
+<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1>
+""")
+            boxplots_out.write("<ul>\n")
+            for pdf in boxplot_pdfs:
+                boxplots_out.write("<li>%s</li>\n" % ahref(pdf))
+            boxplots_out.write("<ul>\n")
+            boxplots_out.write("""</body>
+</html>
+""")
+
+    # Finish
+    print "Amplicon analysis: finishing, exit code: %s" % exit_code
+    sys.exit(exit_code)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/amplicon_analysis_pipeline.xml	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,502 @@
+<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.3.5.0">
+  <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description>
+  <requirements>
+    <requirement type="package" version="1.3.5">amplicon_analysis_pipeline</requirement>
+  </requirements>
+  <stdio>
+    <exit_code range="1:" />
+  </stdio>
+  <command><![CDATA[
+
+  ## Convenience variable for pipeline name
+  #set $pipeline_name = $pipeline.pipeline_name
+
+  ## Set the reference database name
+  #if str( $pipeline_name ) == "DADA2"
+     #set reference_database_name = "silva"
+  #else
+     #set reference_database = $pipeline.reference_database
+     #if $reference_database == "-S"
+        #set reference_database_name = "silva"
+     #else if $reference_database == "-H"
+        #set reference_database_name = "homd"
+     #else
+        #set reference_database_name = "gg"
+     #end if
+  #end if
+
+  ## Run the amplicon analysis pipeline wrapper
+  python $__tool_directory__/amplicon_analysis_pipeline.py
+  ## Set options
+  #if str( $forward_pcr_primer ) != ""
+  -g "$forward_pcr_primer"
+  #end if
+  #if str( $reverse_pcr_primer ) != ""
+  -G "$reverse_pcr_primer"
+  #end if
+  #if str( $trimming_threshold ) != ""
+  -q $trimming_threshold
+  #end if
+  #if str( $sliding_window_length ) != ""
+  -l $sliding_window_length
+  #end if
+  #if str( $minimum_overlap ) != ""
+  -O $minimum_overlap
+  #end if
+  #if str( $minimum_length ) != ""
+  -L $minimum_length
+  #end if
+  -P $pipeline_name
+  -r \${AMPLICON_ANALYSIS_REF_DATA_PATH-ReferenceData}
+  #if str( $pipeline_name ) != "DADA2"
+    ${reference_database}
+  #end if
+  #if str($categories_file_in) != 'None'
+    -c "${categories_file_in}"
+  #end if
+  ## Input files
+  "${metatable_file_in}"
+  ## FASTQ pairs
+  #if str($input_type.pairs_or_collection) == "collection"
+    #set fastq_pairs = $input_type.fastq_collection
+  #else
+    #set fastq_pairs = $input_type.fastq_pairs
+  #end if
+  #for $fq_pair in $fastq_pairs
+    "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}"
+  #end for
+  &&
+
+  ## Collect outputs
+  cp Metatable_log/Metatable_mod.txt "${metatable_mod}" &&
+  #if str( $pipeline_name ) == "Vsearch"
+    # Vsearch-specific
+    cp ${pipeline_name}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" &&
+    cp Multiplexed_files/${pipeline_name}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" &&
+    cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" &&
+  #else
+    # DADA2-specific
+    cp ${pipeline_name}_OTU_tables/DADA2_tax_OTU_table.biom "${tax_otu_table_biom_file}" &&
+    cp ${pipeline_name}_OTU_tables/seqs.fa "${dereplicated_nonchimera_otus_fasta}" &&
+  #end if
+  cp ${pipeline_name}_OTU_tables/otus.tre "${otus_tre_file}" &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/table_summary.txt "${table_summary_file}" &&
+  cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" &&
+
+  ## OTU table heatmap
+  cp RESULTS/${pipeline_name}_${reference_database_name}/Heatmap.pdf "${heatmap_otu_table_pdf}"" &&
+
+  ## HTML outputs
+
+  ## Phylum genus barcharts
+  mkdir $phylum_genus_dist_barcharts_html.files_path &&
+  cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barcharts_html.files_path &&
+  cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/raw_data $phylum_genus_dist_barcharts_html.files_path &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/bar_charts.html "${phylum_genus_dist_barcharts_html}" &&
+
+  ## Beta diversity weighted 2d plots
+  mkdir $beta_div_even_weighted_2d_plots.files_path &&
+  cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/* $beta_div_even_weighted_2d_plots.files_path &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_weighted_2d_plots}" &&
+
+  ## Beta diversity unweighted 2d plots
+  mkdir $beta_div_even_unweighted_2d_plots.files_path &&
+  cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/* $beta_div_even_unweighted_2d_plots.files_path &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_unweighted_2d_plots}" &&
+
+  ## Alpha diversity rarefaction plots
+  mkdir $alpha_div_rarefaction_plots.files_path &&
+  cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/rarefaction_plots.html $alpha_div_rarefaction_plots &&
+  cp -r RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/average_plots $alpha_div_rarefaction_plots.files_path &&
+
+  ## DADA2 error rate plots
+  #if str($pipeline_name) == "DADA2"
+    mkdir $dada2_error_rate_plots.files_path &&
+    cp DADA2_OTU_tables/Error_rate_plots/error_rate_plots.html $dada2_error_rate_plots &&
+    cp -r DADA2_OTU_tables/Error_rate_plots/*.pdf $dada2_error_rate_plots.files_path &&
+  #end if
+
+  ## Categories data
+  #if str($categories_file_in) != 'None'
+    ## Alpha diversity boxplots
+    mkdir $alpha_div_boxplots.files_path &&
+    cp alpha_diversity_boxplots.html "$alpha_div_boxplots" &&
+    cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf $alpha_div_boxplots.files_path &&
+  #end if
+
+  ## Pipeline outputs (log files etc)
+  mkdir $log_files.files_path &&
+  cp Amplicon_analysis_pipeline.log $log_files.files_path &&
+  cp pipeline.log $log_files.files_path &&
+  cp Pipeline_outputs.txt $log_files.files_path &&
+  cp Metatable_log/Metatable.html $log_files.files_path &&
+  cp pipeline_outputs.html "$log_files"
+  ]]></command>
+  <inputs>
+    <param name="title" type="text" value="test" size="25"
+	   label="Title" help="Optional text that will be added to the output dataset names" />
+    <param type="data" name="metatable_file_in" format="tabular"
+	   label="Input Metatable.txt file" />
+    <param type="data" name="categories_file_in" format="txt"
+	   label="Input Categories.txt file" optional="true"
+	   help="(optional)" />
+    <conditional name="input_type">
+      <param name="pairs_or_collection" type="select"
+	     label="Input FASTQ type">
+	<option value="pairs_of_files">Pairs of datasets</option>
+	<option value="collection" selected="true">Dataset pairs in a collection</option>
+      </param>
+      <when value="collection">
+	<param name="fastq_collection" type="data_collection"
+	       format="fastqsanger,fastq" collection_type="list:paired"
+	       label="Collection of FASTQ forward and reverse (R1/R2) pairs"
+	       help="Each FASTQ pair will be treated as one sample; the name of each sample will be taken from the first column of the Metatable file " />
+      </when>
+      <when value="pairs_of_files">
+	<repeat name="fastq_pairs" title="Input fastq pairs" min="1">
+	  <param type="text" name="name" value=""
+		 label="Final name for FASTQ pair" />
+	  <param type="data" name="fastq_r1" format="fastqsanger,fastq"
+		 label="FASTQ with forward reads (R1)" />
+	  <param type="data" name="fastq_r2" format="fastqsanger,fastq"
+		 label="FASTQ with reverse reads (R2)" />
+	</repeat>
+      </when>
+    </conditional>
+    <param type="text" name="forward_pcr_primer" value=""
+	   label="Forward PCR primer sequence"
+	   help="Optional; must not include barcode or adapter sequence (-g)" />
+    <param type="text" name="reverse_pcr_primer" value=""
+	   label="Reverse PCR primer sequence"
+	   help="Optional; must not include barcode or adapter sequence (-G)" />
+    <param type="integer" name="trimming_threshold" value="20"
+	   label="Threshold quality below which read will be trimmed"
+	   help="Phred score; default is 20 (-q)" />
+    <param type="integer" name="minimum_overlap" value="10"
+	   label="Minimum overlap in bp between forward and reverse reads"
+	   help="Default is 10 (-O)" />
+    <param type="integer" name="minimum_length" value="200"
+	   label="Minimum length in bp to keep sequence after overlapping"
+	   help="Default is 200 (-L)" />
+    <param type="integer" name="sliding_window_length" value="10"
+	   label="Minimum length in bp to retain a read after trimming"
+	   help="Supplied to Sickle; default is 10 (-l)" />
+    <conditional name="pipeline">
+      <param type="select" name="pipeline_name"
+	     label="Pipeline to use for analysis">
+	<option value="Vsearch" selected="true" >Vsearch</option>
+	<option value="DADA2">DADA2</option>
+      </param>
+      <when value="Vsearch">
+	<param type="select" name="reference_database"
+	       label="Reference database">
+	  <option value="" selected="true">GreenGenes</option>
+	  <option value="-S">Silva</option>
+	  <option value="-H">Human Oral Microbiome Database (HOMD)</option>
+	</param>
+      </when>
+      <when value="DADA2">
+      </when>
+    </conditional>
+  </inputs>
+  <outputs>
+    <data format="tabular" name="metatable_mod"
+	  label="${tool.name}:${title} Metatable_mod.txt" />
+    <data format="tabular" name="read_counts_out"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} read counts">
+      <filter>pipeline['pipeline_name'] == 'Vsearch'</filter>
+    </data>
+    <data format="biom" name="tax_otu_table_biom_file"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} tax OTU table (biom format)" />
+    <data format="tabular" name="otus_tre_file"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} otus.tre" />
+    <data format="html" name="phylum_genus_dist_barcharts_html"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} phylum genus dist barcharts HTML" />
+    <data format="tabular" name="otus_count_file"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} OTUs count file" />
+    <data format="tabular" name="table_summary_file"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} table summary file" />
+    <data format="fasta" name="dereplicated_nonchimera_otus_fasta"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} multiplexed linearized dereplicated mc2 repset nonchimeras OTUs FASTA" />
+    <data format="html" name="fastqc_quality_boxplots_html"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} FastQC per-base quality boxplots HTML" />
+    <data format="pdf" name="heatmap_otu_table_pdf"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} heatmap OTU table PDF" />
+    <data format="html" name="beta_div_even_weighted_2d_plots"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity weighted 2D plots HTML" />
+    <data format="html" name="beta_div_even_unweighted_2d_plots"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity unweighted 2D plots HTML" />
+    <data format="html" name="alpha_div_rarefaction_plots"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity rarefaction plots HTML" />
+    <data format="html" name="dada2_error_rate_plots"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} DADA2 error rate plots">
+      <filter>pipeline['pipeline_name'] == 'DADA2'</filter>
+    </data>
+    <data format="html" name="alpha_div_boxplots"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity boxplots">
+      <filter>categories_file_in is not None</filter>
+    </data>
+    <data format="html" name="log_files"
+	  label="${tool.name} (${pipeline.pipeline_name}):${title} log files" />
+  </outputs>
+  <tests>
+  </tests>
+  <help><![CDATA[
+
+What it does
+------------
+
+This pipeline has been designed for the analysis of 16S rRNA data from
+Illumina Miseq (Casava >= 1.8) paired-end reads.
+
+Usage
+-----
+
+1. Preparation of the mapping file and format of unique sample id
+*****************************************************************
+
+Before using the amplicon analysis pipeline it would be necessary to
+follow the steps as below to avoid analysis failures and ensure samples
+are labelled appropriately. Sample names for the labelling are derived
+from the fastq files names that are generated from the sequencing. The
+labels will include everything between the beginning of the name and
+the sample number (from C11 to S19 in Fig. 1)
+
+.. image:: Pipeline_description_Fig1.png
+   :height: 46
+   :width: 382
+
+**Figure 1**
+
+If analysing 16S data from multiple runs:
+
+The samples from different runs may have identical IDs. For example,
+when sequencing the same samples twice, by chance, these could be at
+the same position in both the runs. This would cause the fastq files
+to have exactly the same IDs (Fig. 2).
+
+.. image:: Pipeline_description_Fig2.png
+   :height: 100
+   :width: 463
+
+**Figure 2**
+
+In case of identical sample IDs the pipeline will fail to run and
+generate an error at the beginning of the analysis.
+
+To avoid having to change the file names, before uploading the files,
+ensure that the samples IDs are not repeated.
+
+2. To upload the file
+*********************
+
+Click on **Get Data/Upload File** from the Galaxy tool panel on the
+left hand side.
+
+From the pop-up window, choose how to upload the file. The
+**Choose local file** option can be used for files up to 4Gb. Fastq files
+from Illumina MiSeq will rarely be bigger than 4Gb and this option is
+recommended.
+
+After choosing the files click **Start** to begin the upload. The window can
+now be closed and the files will be uploaded onto the Galaxy server. You
+will see the progress on the ``HISTORY`` panel on the right
+side of the screen. The colour will change from grey (queuing), to yellow
+(uploading) and finally green (uploaded).
+
+Once all the files are uploaded, click on the operations on multiple
+datasets icon and select the fastq files that need to be analysed.
+Click on the tab **For all selected...** and on the option
+**Build List of Dataset pairs** (Fig. 3).
+
+.. image:: Pipeline_description_Fig3.png
+   :height: 247
+   :width: 586
+
+**Figure 3**
+
+Change the filter parameter ``_1`` and ``_2`` to be ``_R1`` and ``_R2``.
+The fastq files forward R1 and reverse R2 should now appear in the
+corresponding columns.
+
+Select **Autopair**. This creates a collection of paired fastq files for
+the forward and reverse reads for each sample. The name of the pairs will
+be the ones used by the pipeline. You are free to change the names at this
+point as long as they are the same used in the Metatable file
+(see section 3).
+
+Name the collection and click on **create list**. This reduces the time
+required to input the forward and reverse reads for each individual sample.
+
+3. Create the Metatable files
+*****************************
+
+Metatable.txt
+~~~~~~~~~~~~~
+
+Click on the list of pairs you just created to see the name of the single
+pairs. The name of the pairs will be the ones used by the pipeline,
+therefore, these are the names that need to be used in the Metatable file.
+
+The Metatable file has to be in QIIME format. You can find a description
+of it on QIIME website http://qiime.org/documentation/file_formats.html
+
+EXAMPLE::
+
+    #SampleID    BarcodeSequence    LinkerPrimerSequence    Disease    Gender    Description
+    Mock-RUN1    TAAGGCGAGCGTAAGA                           PsA        Male      Control
+    Mock-RUN2    CGTACTAGGCGTAAGA                           PsA        Male      Control
+    Mock-RUN3    AGGCAGAAGCGTAAGA                           PsC        Female    Control
+
+Briefly: the column ``LinkerPrimerSequence`` is empty but it cannot be
+deleted. The header is very important. ``#SampleID``, ``Barcode``,
+``LinkerPrimerSequence`` and ``Description`` are mandatory. Between
+``LinkerPrimerSequence`` and ``Description`` you can add as many columns
+as you want. For every column a PCoA plot will be created (see
+**Results** section). You can create this file in Excel and it will have
+to be saved as ``Text(Tab delimited)``.
+
+During the analysis the Metatable.txt will be checked to ensure that the
+file has the correct format. If necessary, this will be modified and will
+be available as Metatable_corrected.txt in the history panel. If you are
+going to use the metatable file for any other statistical analyses,
+remember to use the ``Metatable_mod.txt`` one, otherwise the sample
+names might not match!
+
+Categories.txt (optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This file is required if you want to get box plots for comparison of
+alpha diversity indices (see **Results** section). The file is a list
+(without header and IN ONE COLUMN) of categories present in the
+Metatable.txt file. THE NAMES YOU ARE USING HAVE TO BE THE SAME AS THE
+ONES USED IN THE METATABLE.TXT. You can create this file in Excel and
+will have to be saved as ``Text(Tab delimited)``.
+
+EXAMPLE::
+
+    Disease
+    Gender
+
+Metatable and categories files can be uploaded using Get Data as done
+with the fatsq files.
+
+4. Analysis
+***********
+
+Under **Amplicon_Analysis_Pipeline**
+
+ * **Title** Name to distinguish between the runs. It will be shown at
+   the beginning of each output file name.
+
+ * **Input Metatable.txt file** Select the Metatable.txt file related to
+   this analysis
+
+ * **Input Categories.txt file (Optional)** Select the Categories.txt file
+   related to this analysis
+
+ * **Input FASTQ type** select *Dataset pairs in a collection* and, then,
+   the collection of pairs you created earlier.
+
+ * **Forward/Reverse PCR primer sequence** if the PCR primer sequences
+   have not been removed from the MiSeq during the fastq creation, they
+   have to be removed before the analysis. Insert the PCR primer sequence
+   in the corresponding field. DO NOT include any barcode or adapter
+   sequence. If the PCR primers have been already trimmed by the MiSeq,
+   and you include the sequence in this field, this would lead to an error.
+   Only include the sequences if still present in the fastq files.
+
+ * **Threshold quality below which reads will be trimmed** Choose the
+   Phred score used by Sickle to trim the reads at the 3’ end.
+
+ * **Minimum length to retain a read after trimming** If the read length
+   after trimming is shorter than a user defined length, the read, along
+   with the corresponding read pair, will be discarded.
+
+ * **Minimum overlap in bp between forward and reverse reads** Choose the
+   minimum basepair overlap used by Pandaseq to assemble the reads.
+   Default is 10.
+
+ * **Minimum length in bp to keep a sequence after overlapping** Choose the
+   minimum sequence length used by Pandaseq to keep a sequence after the
+   overlapping. This depends on the expected amplicon length. Default is
+   380 (used for V3-V4 16S sequencing; expected length ~440bp)
+
+ * **Pipeline to use for analysis** Choose the pipeline to use for OTU
+   clustering and chimera removal. The Galaxy tool supports the ``Vsearch``
+   and ``DADA2`` pipelines.
+
+ * **Reference database** Choose between ``GreenGenes``, ``Silva`` or
+   ``HOMD`` (Human Oral Microbiome Database) for taxa assignment.
+
+Click on **Execute** to start the analysis.
+
+5. Results
+**********
+
+Results are entirely generated using QIIME scripts. The results will
+appear in the History panel when the analysis is completed.
+
+The following outputs are captured:
+
+ * **Vsearch_tax_OTU_table.biom|DADA2_tax_OTU_table.biom (biom format)**
+   The OTU table in BIOM format (http://biom-format.org/)
+
+ * **otus.tre** Phylogenetic tree constructed using ``make_phylogeny.py``
+   (fasttree) QIIME script (http://qiime.org/scripts/make_phylogeny.html)
+
+ * **Phylum_genus_dist_barcharts_HTML** HTML file with bar charts at
+   Phylum, Genus and Species level
+   (http://qiime.org/scripts/summarize_taxa.html and
+   http://qiime.org/scripts/plot_taxa_summary.html)
+
+ * **OTUs_count_file** Summary of OTU counts per sample
+   (http://biom-format.org/documentation/summarizing_biom_tables.html)
+
+ * **Table_summary_file** Summary of sequences counts per sample
+   (http://biom-format.org/documentation/summarizing_biom_tables.html)
+
+ * **multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta|seqs.fa**
+   Fasta file with OTU sequences (Vsearch|DADA2)
+
+ * **Heatmap_PDF** OTU heatmap in PDF format
+   (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html )
+
+ * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML
+   format using weighted Unifrac distance measure. Samples are grouped
+   by the column names present in the Metatable file. The samples are
+   firstly rarefied to the minimum sequencing depth
+   (http://qiime.org/scripts/beta_diversity_through_plots.html )
+
+ * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML
+   format using Unweighted Unifrac distance measure. Samples are grouped
+   by the column names present in the Metatable file. The samples are
+   firstly rarefied to the minimum sequencing depth
+   (http://qiime.org/scripts/beta_diversity_through_plots.html )
+
+Code availability
+-----------------
+
+**Code is available at** https://github.com/MTutino/Amplicon_analysis
+
+Credits
+-------
+
+Pipeline author: Mauro Tutino
+
+Galaxy tool: Peter Briggs
+
+	]]></help>
+  <citations>
+    <citation type="bibtex">
+      @misc{githubAmplicon_analysis,
+      author = {Tutino, Mauro},
+      year = {2017},
+      title = {Amplicon Analysis Pipeline},
+      publisher = {GitHub},
+      journal = {GitHub repository},
+      url = {https://github.com/MTutino/Amplicon_analysis},
+}</citation>
+  </citations>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/install_amplicon_analysis-1.3.5.sh	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,394 @@
+#!/bin/sh -e
+#
+# Prototype script to setup a conda environment with the
+# dependencies needed for the Amplicon_analysis_pipeline
+# script
+#
+# Handle command line
+usage()
+{
+    echo "Usage: $(basename $0) [DIR]"
+    echo ""
+    echo "Installs the Amplicon_analysis_pipeline package plus"
+    echo "dependencies in directory DIR (or current directory "
+    echo "if DIR not supplied)"
+}
+if [ ! -z "$1" ] ; then
+    # Check if help was requested
+    case "$1" in
+	--help|-h)
+	    usage
+	    exit 0
+	    ;;
+    esac
+    # Assume it's the installation directory
+    cd $1
+fi
+# Versions
+PIPELINE_VERSION=1.3.5
+CONDA_REQUIRED_VERSION=4.6.14
+RDP_CLASSIFIER_VERSION=2.2
+# Directories
+TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION}
+BIN_DIR=${TOP_DIR}/bin
+CONDA_DIR=${TOP_DIR}/conda
+CONDA_BIN=${CONDA_DIR}/bin
+CONDA_LIB=${CONDA_DIR}/lib
+CONDA=${CONDA_BIN}/conda
+ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}"
+ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME
+#
+# Functions
+#
+# Report failure and terminate script
+fail()
+{
+    echo ""
+    echo ERROR $@ >&2
+    echo ""
+    echo "$(basename $0): installation failed"
+    exit 1
+}
+#
+# Rewrite the shebangs in the installed conda scripts
+# to remove the full path to conda 'bin' directory
+rewrite_conda_shebangs()
+{
+    pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g"
+    find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \;
+}
+#
+# Reset conda version if required
+reset_conda_version()
+{
+    CONDA_VERSION="$(${CONDA_BIN}/conda -V 2>&1 | head -n 1 | cut -d' ' -f2)"
+    echo conda version: ${CONDA_VERSION}
+    if [ "${CONDA_VERSION}" != "${CONDA_REQUIRED_VERSION}" ] ; then
+	echo "Resetting conda to last known working version $CONDA_REQUIRED_VERSION"
+	${CONDA_BIN}/conda config --set allow_conda_downgrades true
+	${CONDA_BIN}/conda install -y conda=${CONDA_REQUIRED_VERSION}
+    else
+	echo "conda version ok"
+    fi
+}
+#
+# Install conda
+install_conda()
+{
+    echo "++++++++++++++++"
+    echo "Installing conda"
+    echo "++++++++++++++++"
+    if [ -e ${CONDA_DIR} ] ; then
+	echo "*** $CONDA_DIR already exists ***" >&2
+	return
+    fi
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
+    bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR}
+    echo Installed conda in ${CONDA_DIR}
+    # Reset the conda version to a known working version
+    # (to avoid problems observed with e.g. conda 4.7.10)
+    echo ""
+    reset_conda_version
+    # Update the installation files
+    # This is to avoid problems when the length the installation
+    # directory path exceeds the limit for the shebang statement
+    # in the conda files
+    echo ""
+    echo -n "Rewriting conda shebangs..."
+    rewrite_conda_shebangs
+    echo "ok"
+    echo -n "Adding conda bin to PATH..."
+    PATH=${CONDA_BIN}:$PATH
+    echo "ok"
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# Create conda environment
+install_conda_packages()
+{
+    echo "+++++++++++++++++++++++++"
+    echo "Installing conda packages"
+    echo "+++++++++++++++++++++++++"
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    cat >environment.yml <<EOF
+name: ${ENV_NAME}
+channels:
+  - defaults
+  - conda-forge
+  - bioconda
+dependencies:
+  - python=2.7
+  - cutadapt=1.8
+  - sickle-trim=1.33
+  - bioawk=1.0
+  - pandaseq=2.8.1
+  - spades=3.10.1
+  - fastqc=0.11.3
+  - qiime=1.9.1
+  - blast-legacy=2.2.26
+  - fasta-splitter=0.2.6
+  - rdp_classifier=$RDP_CLASSIFIER_VERSION
+  - vsearch=2.10.4
+  - r=3.5.1
+  - r-tidyverse=1.2.1
+  - bioconductor-dada2=1.8
+  - bioconductor-biomformat=1.8.0
+EOF
+    ${CONDA} env create --name "${ENV_NAME}" -f environment.yml
+    echo Created conda environment in ${ENV_DIR}
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+    #
+    # Patch qiime 1.9.1 tools to switch deprecated 'axisbg'
+    # matplotlib property to 'facecolor':
+    # https://matplotlib.org/api/prev_api_changes/api_changes_2.0.0.html
+    echo ""
+    for exe in make_2d_plots.py plot_taxa_summary.py ; do
+	echo -n "Patching ${exe}..."
+	find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/axisbg=/facecolor=/g' {} \;
+	echo "done"
+    done
+    #
+    # Patch qiime 1.9.1 tools to switch deprecated 'set_axis_bgcolor'
+    # method call to 'set_facecolor':
+    # https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_axis_bgcolor.html
+    for exe in make_rarefaction_plots.py ; do
+	echo -n "Patching ${exe}..."
+	find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/set_axis_bgcolor/set_facecolor/g' {} \;
+	echo "done"
+    done
+}
+#
+# Install all the non-conda dependencies in a single
+# function (invokes separate functions for each package)
+install_non_conda_packages()
+{
+    echo "+++++++++++++++++++++++++++++"
+    echo "Installing non-conda packages"
+    echo "+++++++++++++++++++++++++++++"
+    # Temporary working directory
+    local wd=$(mktemp -d)
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    # Amplicon analysis pipeline
+    echo -n "Installing Amplicon_analysis_pipeline..."
+    if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then
+	echo "already installed"
+    else
+	install_amplicon_analysis_pipeline
+	echo "ok"
+    fi
+    # ChimeraSlayer
+    echo -n "Installing ChimeraSlayer..."
+    if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then
+	echo "already installed"
+    else
+	install_chimeraslayer
+	echo "ok"
+    fi
+    # Uclust
+    # This no longer seems to be available for download from
+    # drive5.com so don't download
+    echo "WARNING uclust not available: skipping installation"
+}
+#
+# Amplicon analyis pipeline
+install_amplicon_analysis_pipeline()
+{
+    local wd=$(mktemp -d)
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://github.com/MTutino/Amplicon_analysis/archive/${PIPELINE_VERSION}.tar.gz
+    tar zxf ${PIPELINE_VERSION}.tar.gz
+    cd Amplicon_analysis-${PIPELINE_VERSION}
+    INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION}
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline
+    for f in *.sh *.R ; do
+	/bin/cp $f $INSTALL_DIR
+    done
+    /bin/cp -r uc2otutab $INSTALL_DIR
+    mkdir -p ${BIN_DIR}
+    cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF
+#!/usr/bin/env bash
+#
+# Point to Qiime config
+export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config
+# Set up the RDP jar file
+export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
+# Set the Matplotlib backend
+export MPLBACKEND="agg"
+# Put the scripts onto the PATH
+export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH
+# Activate the conda environment
+export PATH=${CONDA_BIN}:\$PATH
+source ${CONDA_BIN}/activate ${ENV_NAME}
+# Execute the driver script with the supplied arguments
+$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@
+exit \$?
+EOF
+    chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh
+    cat >${BIN_DIR}/install_reference_data.sh <<EOF
+#!/usr/bin/env bash -e
+#
+function usage() {
+  echo "Usage: \$(basename \$0) DIR"
+}
+if [ -z "\$1" ] ; then
+  usage
+  exit 0
+elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then
+  usage
+  echo ""
+  echo "Install reference data into DIR"
+  exit 0
+fi
+echo "=========================================="
+echo "Installing Amplicon analysis pipeline data"
+echo "=========================================="
+if [ ! -e "\$1" ] ; then
+    echo "Making directory \$1"
+    mkdir -p \$1
+fi
+cd \$1
+DATA_DIR=\$(pwd)
+echo "Installing reference data under \$DATA_DIR"
+$INSTALL_DIR/References.sh
+echo ""
+echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh"
+echo "to use the reference data from this directory"
+echo ""
+echo "\$(basename \$0): finished"
+EOF
+    chmod 0755 ${BIN_DIR}/install_reference_data.sh
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# ChimeraSlayer
+install_chimeraslayer()
+{
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz
+    tar zxf microbiomeutil_2010-04-29.tar.gz
+    cd microbiomeutil_2010-04-29
+    INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer
+    /bin/cp -r ChimeraSlayer $INSTALL_DIR
+    cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF
+#!/usr/bin/env bash
+export PATH=$INSTALL_DIR:\$PATH
+$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@
+EOF
+    chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl
+    chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# uclust required for QIIME/pyNAST
+# License only allows this version to be used with those two packages
+# See: http://drive5.com/uclust/downloads1_2_22q.html
+install_uclust()
+{
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64
+    INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust
+    /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust
+    chmod 0755 ${INSTALL_DIR}/uclust
+    ln -s  ${INSTALL_DIR}/uclust ${BIN_DIR}
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+setup_pipeline_environment()
+{
+    echo "+++++++++++++++++++++++++++++++"
+    echo "Setting up pipeline environment"
+    echo "+++++++++++++++++++++++++++++++"
+    # fasta_splitter.pl
+    echo -n "Setting up fasta_splitter.pl..."
+    if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then
+	echo "already exists"
+    elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then
+	echo "failed"
+	fail "fasta-splitter.pl not found"
+    else
+	ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl
+	echo "ok"
+    fi
+    # rdp_classifier.jar
+    local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
+    echo -n "Setting up rdp_classifier.jar..."
+    if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then
+	echo "already exists"
+    elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then
+	echo "failed"
+	fail "rdp_classifier.jar not found"
+    else
+	mkdir -p ${TOP_DIR}/share/rdp_classifier
+	ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar}
+	echo "ok"
+    fi
+    # qiime_config
+    echo -n "Setting up qiime_config..."
+    if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then
+	echo "already exists"
+    else
+	mkdir -p ${TOP_DIR}/qiime
+	cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config
+qiime_scripts_dir	${ENV_DIR}/bin
+EOF-qiime-config
+	echo "ok"
+    fi
+}
+#
+# Top level script does the installation
+echo "======================================="
+echo "Amplicon_analysis_pipeline installation"
+echo "======================================="
+echo "Installing into ${TOP_DIR}"
+if [ -e ${TOP_DIR} ] ; then
+    fail "Directory already exists"
+fi
+mkdir -p ${TOP_DIR}
+install_conda
+install_conda_packages
+install_non_conda_packages
+setup_pipeline_environment
+echo "===================================="
+echo "Amplicon_analysis_pipeline installed"
+echo "===================================="
+echo ""
+echo "Install reference data using:"
+echo ""
+echo "\$ ${BIN_DIR}/install_reference_data.sh DIR"
+echo ""
+echo "Run pipeline scripts using:"
+echo ""
+echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..."
+echo ""
+echo "(or add ${BIN_DIR} to your PATH)"
+echo ""
+echo "$(basename $0): finished"
+##
+#
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/install_amplicon_analysis.sh	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,425 @@
+#!/bin/sh -e
+#
+# Prototype script to setup a conda environment with the
+# dependencies needed for the Amplicon_analysis_pipeline
+# script
+#
+# Handle command line
+usage()
+{
+    echo "Usage: $(basename $0) [DIR]"
+    echo ""
+    echo "Installs the Amplicon_analysis_pipeline package plus"
+    echo "dependencies in directory DIR (or current directory "
+    echo "if DIR not supplied)"
+}
+if [ ! -z "$1" ] ; then
+    # Check if help was requested
+    case "$1" in
+	--help|-h)
+	    usage
+	    exit 0
+	    ;;
+    esac
+    # Assume it's the installation directory
+    cd $1
+fi
+# Versions
+PIPELINE_VERSION=1.2.3
+RDP_CLASSIFIER_VERSION=2.2
+# Directories
+TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION}
+BIN_DIR=${TOP_DIR}/bin
+CONDA_DIR=${TOP_DIR}/conda
+CONDA_BIN=${CONDA_DIR}/bin
+CONDA_LIB=${CONDA_DIR}/lib
+CONDA=${CONDA_BIN}/conda
+ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}"
+ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME
+#
+# Functions
+#
+# Report failure and terminate script
+fail()
+{
+    echo ""
+    echo ERROR $@ >&2
+    echo ""
+    echo "$(basename $0): installation failed"
+    exit 1
+}
+#
+# Rewrite the shebangs in the installed conda scripts
+# to remove the full path to conda 'bin' directory
+rewrite_conda_shebangs()
+{
+    pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g"
+    find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \;
+}
+#
+# Install conda
+install_conda()
+{
+    echo "++++++++++++++++"
+    echo "Installing conda"
+    echo "++++++++++++++++"
+    if [ -e ${CONDA_DIR} ] ; then
+	echo "*** $CONDA_DIR already exists ***" >&2
+	return
+    fi
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
+    bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR}
+    echo Installed conda in ${CONDA_DIR}
+    # Update the installation files
+    # This is to avoid problems when the length the installation
+    # directory path exceeds the limit for the shebang statement
+    # in the conda files
+    echo ""
+    echo -n "Rewriting conda shebangs..."
+    rewrite_conda_shebangs
+    echo "ok"
+    echo -n "Adding conda bin to PATH..."
+    PATH=${CONDA_BIN}:$PATH
+    echo "ok"
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# Create conda environment
+install_conda_packages()
+{
+    echo "+++++++++++++++++++++++++"
+    echo "Installing conda packages"
+    echo "+++++++++++++++++++++++++"
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    cat >environment.yml <<EOF
+name: ${ENV_NAME}
+channels:
+  - defaults
+  - conda-forge
+  - bioconda
+dependencies:
+  - python=2.7
+  - cutadapt=1.11
+  - sickle-trim=1.33
+  - bioawk=1.0
+  - pandaseq=2.8.1
+  - spades=3.5.0
+  - fastqc=0.11.3
+  - qiime=1.8.0
+  - blast-legacy=2.2.26
+  - fasta-splitter=0.2.4
+  - rdp_classifier=$RDP_CLASSIFIER_VERSION
+  - vsearch=1.1.3
+  # Need to explicitly specify libgfortran
+  # version (otherwise get version incompatible
+  # with numpy=1.7.1)
+  - libgfortran=1.0
+  # Compilers needed to build R
+  - gcc_linux-64
+  - gxx_linux-64
+  - gfortran_linux-64
+EOF
+    ${CONDA} env create --name "${ENV_NAME}" -f environment.yml
+    echo Created conda environment in ${ENV_DIR}
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# Install all the non-conda dependencies in a single
+# function (invokes separate functions for each package)
+install_non_conda_packages()
+{
+    echo "+++++++++++++++++++++++++++++"
+    echo "Installing non-conda packages"
+    echo "+++++++++++++++++++++++++++++"
+    # Temporary working directory
+    local wd=$(mktemp -d)
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    # Amplicon analysis pipeline
+    echo -n "Installing Amplicon_analysis_pipeline..."
+    if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then
+	echo "already installed"
+    else
+	install_amplicon_analysis_pipeline
+	echo "ok"
+    fi
+    # ChimeraSlayer
+    echo -n "Installing ChimeraSlayer..."
+    if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then
+	echo "already installed"
+    else
+	install_chimeraslayer
+	echo "ok"
+    fi
+    # Uclust
+    echo -n "Installing uclust for QIIME/pyNAST..."
+    if [ -e ${BIN_DIR}/uclust ] ; then
+	echo "already installed"
+    else
+	install_uclust
+	echo "ok"
+    fi
+    # R 3.2.1"
+    echo -n "Checking for R 3.2.1..."
+    if [ -e ${BIN_DIR}/R ] ; then
+	echo "R already installed"
+    else
+	echo "not found"
+	install_R_3_2_1
+    fi
+}
+#
+# Amplicon analyis pipeline
+install_amplicon_analysis_pipeline()
+{
+    local wd=$(mktemp -d)
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${PIPELINE_VERSION}.tar.gz
+    tar zxf v${PIPELINE_VERSION}.tar.gz
+    cd Amplicon_analysis-${PIPELINE_VERSION}
+    INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION}
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline
+    for f in *.sh ; do
+	/bin/cp $f $INSTALL_DIR
+    done
+    /bin/cp -r uc2otutab $INSTALL_DIR
+    mkdir -p ${BIN_DIR}
+    cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF
+#!/usr/bin/env bash
+#
+# Point to Qiime config
+export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config
+# Set up the RDP jar file
+export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
+# Put the scripts onto the PATH
+export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH
+# Activate the conda environment
+export PATH=${CONDA_BIN}:\$PATH
+source ${CONDA_BIN}/activate ${ENV_NAME}
+# Execute the driver script with the supplied arguments
+$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@
+exit \$?
+EOF
+    chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh
+    cat >${BIN_DIR}/install_reference_data.sh <<EOF
+#!/usr/bin/env bash -e
+#
+function usage() {
+  echo "Usage: \$(basename \$0) DIR"
+}
+if [ -z "\$1" ] ; then
+  usage
+  exit 0
+elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then
+  usage
+  echo ""
+  echo "Install reference data into DIR"
+  exit 0
+fi
+echo "=========================================="
+echo "Installing Amplicon analysis pipeline data"
+echo "=========================================="
+if [ ! -e "\$1" ] ; then
+    echo "Making directory \$1"
+    mkdir -p \$1
+fi
+cd \$1
+DATA_DIR=\$(pwd)
+echo "Installing reference data under \$DATA_DIR"
+$INSTALL_DIR/References.sh
+echo ""
+echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh"
+echo "to use the reference data from this directory"
+echo ""
+echo "\$(basename \$0): finished"
+EOF
+    chmod 0755 ${BIN_DIR}/install_reference_data.sh
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# ChimeraSlayer
+install_chimeraslayer()
+{
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz
+    tar zxf microbiomeutil_2010-04-29.tar.gz
+    cd microbiomeutil_2010-04-29
+    INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer
+    /bin/cp -r ChimeraSlayer $INSTALL_DIR
+    cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF
+#!/usr/bin/env bash
+export PATH=$INSTALL_DIR:\$PATH
+$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@
+EOF
+    chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl
+    chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# uclust required for QIIME/pyNAST
+# License only allows this version to be used with those two packages
+# See: http://drive5.com/uclust/downloads1_2_22q.html
+install_uclust()
+{
+    local wd=$(mktemp -d)
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64
+    INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22
+    mkdir -p $INSTALL_DIR
+    ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust
+    /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust
+    chmod 0755 ${INSTALL_DIR}/uclust
+    ln -s  ${INSTALL_DIR}/uclust ${BIN_DIR}
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+}
+#
+# R 3.2.1
+# Can't use version from conda due to dependency conflicts
+install_R_3_2_1()
+{
+    . ${CONDA_BIN}/activate ${ENV_NAME}
+    local cwd=$(pwd)
+    local wd=$(mktemp -d)
+    cd $wd
+    echo -n "Fetching R 3.2.1 source code..."
+    wget -q http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz
+    echo "ok"
+    INSTALL_DIR=${TOP_DIR}
+    mkdir -p $INSTALL_DIR
+    echo -n "Unpacking source code..."
+    tar xzf R-3.2.1.tar.gz >INSTALL.log 2>&1
+    echo "ok"
+    cd R-3.2.1
+    echo -n "Running configure..."
+    ./configure --prefix=$INSTALL_DIR --with-x=no --with-readline=no >>INSTALL.log 2>&1
+    echo "ok"
+    echo -n "Running make..."
+    make >>INSTALL.log 2>&1
+    echo "ok"
+    echo -n "Running make install..."
+    make install >>INSTALL.log 2>&1
+    echo "ok"
+    cd $cwd
+    rm -rf $wd/*
+    rmdir $wd
+    . ${CONDA_BIN}/deactivate
+}
+setup_pipeline_environment()
+{
+    echo "+++++++++++++++++++++++++++++++"
+    echo "Setting up pipeline environment"
+    echo "+++++++++++++++++++++++++++++++"
+    # vsearch113
+    echo -n "Setting up vsearch113..."
+    if [ -e ${BIN_DIR}/vsearch113 ] ; then
+	echo "already exists"
+    elif [ ! -e ${ENV_DIR}/bin/vsearch ] ; then
+	echo "failed"
+	fail "vsearch not found"
+    else
+	ln -s ${ENV_DIR}/bin/vsearch ${BIN_DIR}/vsearch113
+	echo "ok"
+    fi
+    # fasta_splitter.pl
+    echo -n "Setting up fasta_splitter.pl..."
+    if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then
+	echo "already exists"
+    elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then
+	echo "failed"
+	fail "fasta-splitter.pl not found"
+    else
+	ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl
+	echo "ok"
+    fi
+    # rdp_classifier.jar
+    local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar
+    echo -n "Setting up rdp_classifier.jar..."
+    if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then
+	echo "already exists"
+    elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then
+	echo "failed"
+	fail "rdp_classifier.jar not found"
+    else
+	mkdir -p ${TOP_DIR}/share/rdp_classifier
+	ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar}
+	echo "ok"
+    fi
+    # qiime_config
+    echo -n "Setting up qiime_config..."
+    if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then
+	echo "already exists"
+    else
+	mkdir -p ${TOP_DIR}/qiime
+	cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config
+qiime_scripts_dir	${ENV_DIR}/bin
+EOF-qiime-config
+	echo "ok"
+    fi
+}
+#
+# Remove the compilers from the conda environment
+# Not sure if this step is necessary
+remove_conda_compilers()
+{
+    echo "+++++++++++++++++++++++++++++++++++++++++"
+    echo "Removing compilers from conda environment"
+    echo "+++++++++++++++++++++++++++++++++++++++++"
+    ${CONDA} remove -y -n ${ENV_NAME} gcc_linux-64 gxx_linux-64 gfortran_linux-64
+}
+#
+# Top level script does the installation
+echo "======================================="
+echo "Amplicon_analysis_pipeline installation"
+echo "======================================="
+echo "Installing into ${TOP_DIR}"
+if [ -e ${TOP_DIR} ] ; then
+    fail "Directory already exists"
+fi
+mkdir -p ${TOP_DIR}
+install_conda
+install_conda_packages
+install_non_conda_packages
+setup_pipeline_environment
+remove_conda_compilers
+echo "===================================="
+echo "Amplicon_analysis_pipeline installed"
+echo "===================================="
+echo ""
+echo "Install reference data using:"
+echo ""
+echo "\$ ${BIN_DIR}/install_reference_data.sh DIR"
+echo ""
+echo "Run pipeline scripts using:"
+echo ""
+echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..."
+echo ""
+echo "(or add ${BIN_DIR} to your PATH)"
+echo ""
+echo "$(basename $0): finished"
+##
+#
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/outputs.txt	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,41 @@
+ok.. Metatable_log/Metatable_mod.txt
+ok.. Vsearch_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom
+ok.. Vsearch_OTU_tables/otus.tre
+ok.. RESULTS/Vsearch_gg/OTUs_count.txt
+ok.. RESULTS/Vsearch_gg/table_summary.txt
+ok.. Multiplexed_files/Vsearch_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta
+ok.. QUALITY_CONTROL/Reads_count.txt
+ok.. fastqc_quality_boxplots.html -> generated by the Python wrapper
+NO.. RESULTS/Vsearch_gg/Heatmap/js -> RESULTS/Vsearch_gg/Heatmap.pdf
+NO.. RESULTS/Vsearch_gg/Heatmap/otu_table.html -> MISSING
+ok.. RESULTS/Vsearch_gg/phylum_genus_charts/charts/
+ok.. RESULTS/Vsearch_gg/phylum_genus_charts/raw_data/
+ok.. RESULTS/Vsearch_gg/phylum_genus_charts/bar_charts.html
+ok.. RESULTS/Vsearch_gg/beta_div_even/weighted_2d_plot/*
+ok.. RESULTS/Vsearch_gg/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html
+ok.. RESULTS/Vsearch_gg/beta_div_even/unweighted_2d_plot/*
+ok.. RESULTS/Vsearch_gg/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html
+ok.. RESULTS/Vsearch_gg/Alpha_diversity/rarefaction_curves/rarefaction_plots.html
+ok.. RESULTS/Vsearch_gg/Alpha_diversity/rarefaction_curves/average_plots
+ok.. RESULTS/Vsearch_gg/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf
+
+??.. Metatable_log/Metatable_mod.txt
+NO.. DADA2_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom
+ok.. DADA2_OTU_tables/otus.tre
+ok.. RESULTS/DADA2_silva/OTUs_count.txt
+ok.. RESULTS/DADA2_silva/table_summary.txt
+ok.. Multiplexed_files/DADA2_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta --> DADA2_OTU_tables/seqs.fa
+NO.. QUALITY_CONTROL/Reads_count.txt -> Vsearch only
+ok.. fastqc_quality_boxplots.html -> generated by the Python wrapper
+NO.. RESULTS/DADA2_silva/Heatmap/js -> RESULTS/DADA2_silva/Heatmap.pdf
+NO.. RESULTS/DADA2_silva/Heatmap/otu_table.html
+ok.. RESULTS/DADA2_silva/phylum_genus_charts/charts/
+ok.. RESULTS/DADA2_silva/phylum_genus_charts/raw_data/
+ok.. RESULTS/DADA2_silva/phylum_genus_charts/bar_charts.html
+ok.. RESULTS/DADA2_silva/beta_div_even/weighted_2d_plot/*
+ok.. RESULTS/DADA2_silva/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html
+ok.. RESULTS/DADA2_silva/beta_div_even/unweighted_2d_plot/*
+ok.. RESULTS/DADA2_silva/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html
+ok.. RESULTS/DADA2_silva/Alpha_diversity/rarefaction_curves/rarefaction_plots.html
+ok.. RESULTS/DADA2_silva/Alpha_diversity/rarefaction_curves/average_plots
+ok.. RESULTS/DADA2_silva/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf -> missing? (didn't include categories?)
Binary file static/images/Pipeline_description_Fig1.png has changed
Binary file static/images/Pipeline_description_Fig2.png has changed
Binary file static/images/Pipeline_description_Fig3.png has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,16 @@
+<?xml version="1.0"?>
+<tool_dependency>
+  <package name="amplicon_analysis_pipeline" version="1.3.5">
+    <install version="1.0">
+      <actions>
+	<action type="download_file">https://raw.githubusercontent.com/pjbriggs/Amplicon_analysis-galaxy/update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh</action>
+	<action type="shell_command">
+	  sh ./install_amplicon_analysis.sh $INSTALL_DIR
+	</action>
+	<action type="set_environment">
+	    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/Amplicon_analysis-1.3.5/bin</environment_variable>
+	</action>
+      </actions>
+    </install>
+  </package>
+</tool_dependency>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/updating-to-pipeline-1.3-DADA2.txt	Thu Dec 05 11:48:01 2019 +0000
@@ -0,0 +1,58 @@
+Notes on updating Galaxy tool to pipeline 1.3 (DADA2)
+=====================================================
+
+Where stuff is:
+
+* projects/Amplicon_analysis-galaxy: git repo for Galaxy tool (these
+  developments are in the 'update-to-Amplicon_analysis_pipeline-1.3'
+  branch, PR #50:
+  https://github.com/pjbriggs/Amplicon_analysis-galaxy/pull/50)
+
+* scratchpad/test_Amplicon_analysis_pipeline_DADA2: directory for
+  running/testing the updates
+
+So far:
+
+* Updated the installer for pipeline version 1.3.2
+
+* Have been trying to run the pipeline manually outside of Galaxy
+  on popov & CSF3:
+  -- DADA2 works on popov (can't remember if it works on CSF3)
+  -- Vsearch pipeline fails on popov and CSF3 (but errors are
+     different)
+
+* Mauro is looking at fixing the errors while I carry on trying
+  to update the Galaxy tool
+
+Random notes from my notebook:
+
+p44:
+
+* DADA2 uses NSLOTS environment variable from the local environment
+  (so can get number of cores on cluster; if NSLOTS not set then
+  gets number of cores on local machine)
+
+* DADA2 has new outputs:
+  -- DADA2_OTU_tables/Error_rate_plots/ <-- need to capture all
+     PDFs from this folder
+
+pp78-79:
+
+* Galaxy wrapper could check that 'Run' column is in supplied
+  metatable file (if it's not present then pipeline will fail
+  now)
+
+* DADA2 has its own reference database
+
+* DADA2 produces same outputs as Vsearch (with name changed from
+  "Vsearch_*" to "DADA2_*", plus extras:
+  -- Vsearch_OTUs.tre -> otus.tre
+  -- Vsearch_multiplexed_linearised_dereplicated_mc2_repset_nonchimeras_OTUS.fasta -> seqs.fa
+  -- There might be issues with the heatmap
+
+p83: notes on progress...
+
+p95:
+
+* Confirms heatmap is now e.g. RESULTS/Vsearch_silva/Heatmap.pdf
+  (instead of HTML output)