Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
changeset 42:098ad1dd7760 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 10be6f00106e853a6720e4052871d9d84e027137
author | pjbriggs |
---|---|
date | Thu, 05 Dec 2019 11:48:01 +0000 |
parents | 7b9786a43a16 |
children | 496cc0ddce3d |
files | Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.gitignore Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.shed.yml Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/README.rst Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.py Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.xml Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis.sh Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig1.png Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig2.png Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig3.png Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/tool_dependencies.xml README.rst amplicon_analysis_pipeline.py amplicon_analysis_pipeline.xml install_amplicon_analysis-1.3.5.sh install_amplicon_analysis.sh outputs.txt static/images/Pipeline_description_Fig1.png static/images/Pipeline_description_Fig2.png static/images/Pipeline_description_Fig3.png tool_dependencies.xml updating-to-pipeline-1.3-DADA2.txt |
diffstat | 22 files changed, 2019 insertions(+), 1943 deletions(-) [+] |
line wrap: on
line diff
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.gitignore Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,7 +0,0 @@ -\#*\# -.\#* -*~ -*.pyc -*.bak -auto_process_settings_local.py -settings.ini
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/.shed.yml Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,16 +0,0 @@ ---- -categories: -- Metagenomics -description: Analyse paired-end 16S rRNA data from Illumina Miseq -homepage_url: https://github.com/MTutino/Amplicon_analysis -long_description: | - A Galaxy tool wrapper to Mauro Tutino's Amplicon_analysis pipeline - at https://github.com/MTutino/Amplicon_analysis - - The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq - (Casava >= 1.8) and performs: QC and clean up of input data; removal of - singletons and chimeras and building of OTU table and phylogenetic tree; - beta and alpha diversity analysis -name: amplicon_analysis_pipeline -owner: pjbriggs -remote_repository_url: https://github.com/pjbriggs/Amplicon_analysis-galaxy
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/README.rst Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,213 +0,0 @@ -Amplicon_analysis-galaxy -======================== - -A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline -script at https://github.com/MTutino/Amplicon_analysis - -The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq -(Casava >= 1.8) and performs the following operations: - - * QC and clean up of input data - * Removal of singletons and chimeras and building of OTU table - and phylogenetic tree - * Beta and alpha diversity of analysis - -Usage documentation -=================== - -Usage of the tool (including required inputs) is documented within -the ``help`` section of the tool XML. - -Installing the tool in a Galaxy instance -======================================== - -The following sections describe how to install the tool files, -dependencies and reference data, and how to configure the Galaxy -instance to detect the dependencies and reference data correctly -at run time. - -1. Install the tool from the toolshed -------------------------------------- - -The core tool is hosted on the Galaxy toolshed, so it can be installed -directly from there (this is the recommended route): - - * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ - -Alternatively it can be installed manually; in this case there are two -files to install: - - * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) - * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) - -Put these in a directory that is visible to Galaxy (e.g. a -``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` -file to tell Galaxy to offer the tool by adding the line e.g.:: - - <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> - -2. Install the reference data ------------------------------ - -The script ``References.sh`` from the pipeline package at -https://github.com/MTutino/Amplicon_analysis can be run to install -the reference data, for example:: - - cd /path/to/pipeline/data - wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh - /bin/bash ./References.sh - -will install the data in ``/path/to/pipeline/data``. - -**NB** The final amount of data downloaded and uncompressed will be -around 9GB. - -3. Configure reference data location in Galaxy ----------------------------------------------- - -The final step is to make your Galaxy installation aware of the -location of the reference data, so it can locate them both when the -tool is run. - -The tool locates the reference data via an environment variable called -``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent -directory where the reference data has been installed. - -There are various ways to do this, depending on how your Galaxy -installation is configured: - - * **For local instances:** add a line to set it in the - ``config/local_env.sh`` file of your Galaxy installation (you - may need to create a new empty file first), e.g.:: - - export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data - - * **For production instances:** set the value in the ``job_conf.xml`` - configuration file, e.g.:: - - <destination id="amplicon_analysis"> - <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> - </destination> - - and then specify that the pipeline tool uses this destination:: - - <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> - - (For more about job destinations see the Galaxy documentation at - https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations) - -4. Enable rendering of HTML outputs from pipeline -------------------------------------------------- - -To ensure that HTML outputs are displayed correctly in Galaxy -(for example the Vsearch OTU table heatmaps), Galaxy needs to be -configured not to sanitize the outputs from the ``Amplicon_analysis`` -tool. - -Either: - - * **For local instances:** set ``sanitize_all_html = False`` in - ``config/galaxy.ini`` (nb don't do this on production servers or - public instances!); or - - * **For production instances:** add the ``Amplicon_analysis`` tool - to the display whitelist in the Galaxy instance: - - - Set ``sanitize_whitelist_file = config/whitelist.txt`` in - ``config/galaxy.ini`` and restart Galaxy; - - Go to ``Admin>Manage Display Whitelist``, check the box for - ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' - search function to help locate it) and click on - ``Submit new whitelist`` to update the settings. - -Additional details -================== - -Some other things to be aware of: - - * Note that using the Silva database requires a minimum of 18Gb RAM - -Known problems -============== - - * Only the ``VSEARCH`` pipeline in Mauro's script is currently - available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` - pipelines have yet to be implemented. - * The images in the tool help section are not visible if the - tool has been installed locally, or if it has been installed in - a Galaxy instance which is served from a subdirectory. - - These are both problems with Galaxy and not the tool, see - https://github.com/galaxyproject/galaxy/issues/4490 and - https://github.com/galaxyproject/galaxy/issues/1676 - -Appendix: installing the dependencies manually -============================================== - -If the tool is installed from the Galaxy toolshed (recommended) then -the dependencies should be installed automatically and this step can -be skipped. - -Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used -to fetch and install the dependencies locally, for example:: - - install_amplicon_analysis.sh /path/to/local_tool_dependencies - -(This is the same script as is used to install dependencies from the -toolshed.) This can take some time to complete, and when completed will -have created a directory called ``Amplicon_analysis-1.2.3`` containing -the dependencies under the specified top level directory. - -**NB** The installed dependencies will occupy around 2.6G of disk -space. - -You will need to make sure that the ``bin`` subdirectory of this -directory is on Galaxy's ``PATH`` at runtime, for the tool to be able -to access the dependencies - for example by adding a line to the -``local_env.sh`` file like:: - - export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH - -History -======= - -========== ====================================================================== -Version Changes ----------- ---------------------------------------------------------------------- -1.3.5.0 Updated to Amplicon_Analysis_Pipeline version 1.3.5. -1.2.3.0 Updated to Amplicon_Analysis_Pipeline version 1.2.3; install - dependencies via tool_dependencies.xml. -1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes - jackknifed analysis which is not captured by Galaxy tool) -1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds - option to use the Human Oral Microbiome Database v15.1, and - updates SILVA database to v123) -1.1.0 First official version on Galaxy toolshed. -1.0.6 Expand inline documentation to provide detailed usage guidance. -1.0.5 Updates including: - - - Capture read counts from quality control as new output dataset - - Capture FastQC per-base quality boxplots for each sample as - new output dataset - - Add support for -l option (sliding window length for trimming) - - Default for -L set to "200" -1.0.4 Various updates: - - - Additional outputs are captured when a "Categories" file is - supplied (alpha diversity rarefaction curves and boxplots) - - Sample names derived from Fastqs in a collection of pairs - are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) - - Input Fastqs can now be of more general ``fastq`` type - - Log file outputs are captured in new output dataset - - User can specify a "title" for the job which is copied into - the dataset names (to distinguish outputs from different runs) - - Improved detection and reporting of problems with input - Metatable -1.0.3 Take the sample names from the collection dataset names when - using collection as input (this is now the default input mode); - collect additional output dataset; disable ``usearch``-based - pipelines (i.e. ``UPARSE`` and ``QIIME``). -1.0.2 Enable support for FASTQs supplied via dataset collections and - fix some broken output datasets. -1.0.1 Initial version -========== ======================================================================
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.py Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,370 +0,0 @@ -#!/usr/bin/env python -# -# Wrapper script to run Amplicon_analysis_pipeline.sh -# from Galaxy tool - -import sys -import os -import argparse -import subprocess -import glob - -class PipelineCmd(object): - def __init__(self,cmd): - self.cmd = [str(cmd)] - def add_args(self,*args): - for arg in args: - self.cmd.append(str(arg)) - def __repr__(self): - return ' '.join([str(arg) for arg in self.cmd]) - -def ahref(target,name=None,type=None): - if name is None: - name = os.path.basename(target) - ahref = "<a href='%s'" % target - if type is not None: - ahref += " type='%s'" % type - ahref += ">%s</a>" % name - return ahref - -def check_errors(): - # Errors in Amplicon_analysis_pipeline.log - with open('Amplicon_analysis_pipeline.log','r') as pipeline_log: - log = pipeline_log.read() - if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log: - print_error("""*** Sample IDs don't match dataset names *** - -The sample IDs (first column of the Metatable file) don't match the -supplied sample names for the input Fastq pairs. -""") - # Errors in pipeline output - with open('pipeline.log','r') as pipeline_log: - log = pipeline_log.read() - if "Errors and/or warnings detected in mapping file" in log: - with open("Metatable_log/Metatable.log","r") as metatable_log: - # Echo the Metatable log file to the tool log - print_error("""*** Error in Metatable mapping file *** - -%s""" % metatable_log.read()) - elif "No header line was found in mapping file" in log: - # Report error to the tool log - print_error("""*** No header in Metatable mapping file *** - -Check you've specified the correct file as the input Metatable""") - -def print_error(message): - width = max([len(line) for line in message.split('\n')]) + 4 - sys.stderr.write("\n%s\n" % ('*'*width)) - for line in message.split('\n'): - sys.stderr.write("* %s%s *\n" % (line,' '*(width-len(line)-4))) - sys.stderr.write("%s\n\n" % ('*'*width)) - -def clean_up_name(sample): - # Remove extensions and trailing "_L[0-9]+_001" from - # Fastq pair names - sample_name = '.'.join(sample.split('.')[:1]) - split_name = sample_name.split('_') - if split_name[-1] == "001": - split_name = split_name[:-1] - if split_name[-1].startswith('L'): - try: - int(split_name[-1][1:]) - split_name = split_name[:-1] - except ValueError: - pass - return '_'.join(split_name) - -def list_outputs(filen=None): - # List the output directory contents - # If filen is specified then will be the filename to - # write to, otherwise write to stdout - if filen is not None: - fp = open(filen,'w') - else: - fp = sys.stdout - results_dir = os.path.abspath("RESULTS") - fp.write("Listing contents of output dir %s:\n" % results_dir) - ix = 0 - for d,dirs,files in os.walk(results_dir): - ix += 1 - fp.write("-- %d: %s\n" % (ix, - os.path.relpath(d,results_dir))) - for f in files: - ix += 1 - fp.write("---- %d: %s\n" % (ix, - os.path.relpath(f,results_dir))) - # Close output file - if filen is not None: - fp.close() - -if __name__ == "__main__": - # Command line - print "Amplicon analysis: starting" - p = argparse.ArgumentParser() - p.add_argument("metatable", - metavar="METATABLE_FILE", - help="Metatable.txt file") - p.add_argument("fastq_pairs", - metavar="SAMPLE_NAME FQ_R1 FQ_R2", - nargs="+", - default=list(), - help="Triplets of SAMPLE_NAME followed by " - "a R1/R2 FASTQ file pair") - p.add_argument("-g",dest="forward_pcr_primer") - p.add_argument("-G",dest="reverse_pcr_primer") - p.add_argument("-q",dest="trimming_threshold") - p.add_argument("-O",dest="minimum_overlap") - p.add_argument("-L",dest="minimum_length") - p.add_argument("-l",dest="sliding_window_length") - p.add_argument("-P",dest="pipeline", - choices=["Vsearch","DADA2"], - type=str, - default="Vsearch") - p.add_argument("-S",dest="use_silva",action="store_true") - p.add_argument("-H",dest="use_homd",action="store_true") - p.add_argument("-r",dest="reference_data_path") - p.add_argument("-c",dest="categories_file") - args = p.parse_args() - - # Build the environment for running the pipeline - print "Amplicon analysis: building the environment" - metatable_file = os.path.abspath(args.metatable) - os.symlink(metatable_file,"Metatable.txt") - print "-- made symlink to Metatable.txt" - - # Link to Categories.txt file (if provided) - if args.categories_file is not None: - categories_file = os.path.abspath(args.categories_file) - os.symlink(categories_file,"Categories.txt") - print "-- made symlink to Categories.txt" - - # Link to FASTQs and construct Final_name.txt file - sample_names = [] - print "-- making Final_name.txt" - with open("Final_name.txt",'w') as final_name: - fastqs = iter(args.fastq_pairs) - for sample_name,fqr1,fqr2 in zip(fastqs,fastqs,fastqs): - sample_name = clean_up_name(sample_name) - print " %s" % sample_name - r1 = "%s_R1_.fastq" % sample_name - r2 = "%s_R2_.fastq" % sample_name - os.symlink(fqr1,r1) - os.symlink(fqr2,r2) - final_name.write("%s\n" % '\t'.join((r1,sample_name))) - final_name.write("%s\n" % '\t'.join((r2,sample_name))) - sample_names.append(sample_name) - - # Reference database - if args.use_silva: - ref_database = "silva" - elif args.use_homd: - ref_database = "homd" - else: - ref_database = "gg" - - # Construct the pipeline command - print "Amplicon analysis: constructing pipeline command" - pipeline = PipelineCmd("Amplicon_analysis_pipeline.sh") - if args.forward_pcr_primer: - pipeline.add_args("-g",args.forward_pcr_primer) - if args.reverse_pcr_primer: - pipeline.add_args("-G",args.reverse_pcr_primer) - if args.trimming_threshold: - pipeline.add_args("-q",args.trimming_threshold) - if args.minimum_overlap: - pipeline.add_args("-O",args.minimum_overlap) - if args.minimum_length: - pipeline.add_args("-L",args.minimum_length) - if args.sliding_window_length: - pipeline.add_args("-l",args.sliding_window_length) - if args.reference_data_path: - pipeline.add_args("-r",args.reference_data_path) - pipeline.add_args("-P",args.pipeline) - if ref_database == "silva": - pipeline.add_args("-S") - elif ref_database == "homd": - pipeline.add_args("-H") - - # Echo the pipeline command to stdout - print "Running %s" % pipeline - - # Run the pipeline - with open("pipeline.log","w") as pipeline_out: - try: - subprocess.check_call(pipeline.cmd, - stdout=pipeline_out, - stderr=subprocess.STDOUT) - exit_code = 0 - print "Pipeline completed ok" - except subprocess.CalledProcessError as ex: - # Non-zero exit status - sys.stderr.write("Pipeline failed: exit code %s\n" % - ex.returncode) - exit_code = ex.returncode - except Exception as ex: - # Some other problem - sys.stderr.write("Unexpected error: %s\n" % str(ex)) - exit_code = 1 - - # Write out the list of outputs - outputs_file = "Pipeline_outputs.txt" - list_outputs(outputs_file) - - # Check for log file - log_file = "Amplicon_analysis_pipeline.log" - if os.path.exists(log_file): - print "Found log file: %s" % log_file - if exit_code == 0: - # Create an HTML file to link to log files etc - # NB the paths to the files should be correct once - # copied by Galaxy on job completion - with open("pipeline_outputs.html","w") as html_out: - html_out.write("""<html> -<head> -<title>Amplicon analysis pipeline: log files</title> -<head> -<body> -<h1>Amplicon analysis pipeline: log files</h1> -<ul> -""") - html_out.write( - "<li>%s</li>\n" % - ahref("Amplicon_analysis_pipeline.log", - type="text/plain")) - html_out.write( - "<li>%s</li>\n" % - ahref("pipeline.log",type="text/plain")) - html_out.write( - "<li>%s</li>\n" % - ahref("Pipeline_outputs.txt", - type="text/plain")) - html_out.write( - "<li>%s</li>\n" % - ahref("Metatable.html")) - html_out.write("""<ul> -</body> -</html> -""") - else: - # Check for known error messages - check_errors() - # Write pipeline stdout to tool stderr - sys.stderr.write("\nOutput from pipeline:\n") - with open("pipeline.log",'r') as log: - sys.stderr.write("%s" % log.read()) - # Write log file contents to tool log - print "\nAmplicon_analysis_pipeline.log:" - with open(log_file,'r') as log: - print "%s" % log.read() - else: - sys.stderr.write("ERROR missing log file \"%s\"\n" % - log_file) - - # Handle FastQC boxplots - print "Amplicon analysis: collating per base quality boxplots" - with open("fastqc_quality_boxplots.html","w") as quality_boxplots: - # PHRED value for trimming - phred_score = 20 - if args.trimming_threshold is not None: - phred_score = args.trimming_threshold - # Write header for HTML output file - quality_boxplots.write("""<html> -<head> -<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title> -<head> -<body> -<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1> -""") - # Look for raw and trimmed FastQC output for each sample - for sample_name in sample_names: - fastqc_dir = os.path.join(sample_name,"FastQC") - quality_boxplots.write("<h2>%s</h2>" % sample_name) - for d in ("Raw","cutdapt_sickle/Q%s" % phred_score): - quality_boxplots.write("<h3>%s</h3>" % d) - fastqc_html_files = glob.glob( - os.path.join(fastqc_dir,d,"*_fastqc.html")) - if not fastqc_html_files: - quality_boxplots.write("<p>No FastQC outputs found</p>") - continue - # Pull out the per-base quality boxplots - for f in fastqc_html_files: - boxplot = None - with open(f) as fp: - for line in fp.read().split(">"): - try: - line.index("alt=\"Per base quality graph\"") - boxplot = line + ">" - break - except ValueError: - pass - if boxplot is None: - boxplot = "Missing plot" - quality_boxplots.write("<h4>%s</h4><p>%s</p>" % - - (os.path.basename(f), - boxplot)) - quality_boxplots.write("""</body> -</html> -""") - - # Handle DADA2 error rate plot PDFs - if args.pipeline == "DADA2": - print("Amplicon analysis: collecting error rate plots") - error_rate_plots_dir = os.path.abspath( - os.path.join("DADA2_OTU_tables", - "Error_rate_plots")) - error_rate_plot_pdfs = [os.path.basename(pdf) - for pdf in - sorted(glob.glob( - os.path.join(error_rate_plots_dir,"*.pdf")))] - with open("error_rate_plots.html","w") as error_rate_plots_out: - error_rate_plots_out.write("""<html> -<head> -<title>Amplicon analysis pipeline: DADA2 Error Rate Plots</title> -<head> -<body> -<h1>Amplicon analysis pipeline: DADA2 Error Rate Plots</h1> -""") - error_rate_plots_out.write("<ul>\n") - for pdf in error_rate_plot_pdfs: - error_rate_plots_out.write("<li>%s</li>\n" % ahref(pdf)) - error_rate_plots_out.write("<ul>\n") - error_rate_plots_out.write("""</body> -</html> -""") - - # Handle additional output when categories file was supplied - if args.categories_file is not None: - # Alpha diversity boxplots - print "Amplicon analysis: indexing alpha diversity boxplots" - boxplots_dir = os.path.abspath( - os.path.join("RESULTS", - "%s_%s" % (args.pipeline, - ref_database), - "Alpha_diversity", - "Alpha_diversity_boxplot", - "Categories_shannon")) - print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir - boxplot_pdfs = [os.path.basename(pdf) - for pdf in - sorted(glob.glob( - os.path.join(boxplots_dir,"*.pdf")))] - with open("alpha_diversity_boxplots.html","w") as boxplots_out: - boxplots_out.write("""<html> -<head> -<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title> -<head> -<body> -<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1> -""") - boxplots_out.write("<ul>\n") - for pdf in boxplot_pdfs: - boxplots_out.write("<li>%s</li>\n" % ahref(pdf)) - boxplots_out.write("<ul>\n") - boxplots_out.write("""</body> -</html> -""") - - # Finish - print "Amplicon analysis: finishing, exit code: %s" % exit_code - sys.exit(exit_code)
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/amplicon_analysis_pipeline.xml Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,502 +0,0 @@ -<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.3.5.0"> - <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description> - <requirements> - <requirement type="package" version="1.3.5">amplicon_analysis_pipeline</requirement> - </requirements> - <stdio> - <exit_code range="1:" /> - </stdio> - <command><![CDATA[ - - ## Convenience variable for pipeline name - #set $pipeline_name = $pipeline.pipeline_name - - ## Set the reference database name - #if str( $pipeline_name ) == "DADA2" - #set reference_database_name = "silva" - #else - #set reference_database = $pipeline.reference_database - #if $reference_database == "-S" - #set reference_database_name = "silva" - #else if $reference_database == "-H" - #set reference_database_name = "homd" - #else - #set reference_database_name = "gg" - #end if - #end if - - ## Run the amplicon analysis pipeline wrapper - python $__tool_directory__/amplicon_analysis_pipeline.py - ## Set options - #if str( $forward_pcr_primer ) != "" - -g "$forward_pcr_primer" - #end if - #if str( $reverse_pcr_primer ) != "" - -G "$reverse_pcr_primer" - #end if - #if str( $trimming_threshold ) != "" - -q $trimming_threshold - #end if - #if str( $sliding_window_length ) != "" - -l $sliding_window_length - #end if - #if str( $minimum_overlap ) != "" - -O $minimum_overlap - #end if - #if str( $minimum_length ) != "" - -L $minimum_length - #end if - -P $pipeline_name - -r \${AMPLICON_ANALYSIS_REF_DATA_PATH-ReferenceData} - #if str( $pipeline_name ) != "DADA2" - ${reference_database} - #end if - #if str($categories_file_in) != 'None' - -c "${categories_file_in}" - #end if - ## Input files - "${metatable_file_in}" - ## FASTQ pairs - #if str($input_type.pairs_or_collection) == "collection" - #set fastq_pairs = $input_type.fastq_collection - #else - #set fastq_pairs = $input_type.fastq_pairs - #end if - #for $fq_pair in $fastq_pairs - "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}" - #end for - && - - ## Collect outputs - cp Metatable_log/Metatable_mod.txt "${metatable_mod}" && - #if str( $pipeline_name ) == "Vsearch" - # Vsearch-specific - cp ${pipeline_name}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" && - cp Multiplexed_files/${pipeline_name}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" && - cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" && - #else - # DADA2-specific - cp ${pipeline_name}_OTU_tables/DADA2_tax_OTU_table.biom "${tax_otu_table_biom_file}" && - cp ${pipeline_name}_OTU_tables/seqs.fa "${dereplicated_nonchimera_otus_fasta}" && - #end if - cp ${pipeline_name}_OTU_tables/otus.tre "${otus_tre_file}" && - cp RESULTS/${pipeline_name}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" && - cp RESULTS/${pipeline_name}_${reference_database_name}/table_summary.txt "${table_summary_file}" && - cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" && - - ## OTU table heatmap - cp RESULTS/${pipeline_name}_${reference_database_name}/Heatmap.pdf "${heatmap_otu_table_pdf}"" && - - ## HTML outputs - - ## Phylum genus barcharts - mkdir $phylum_genus_dist_barcharts_html.files_path && - cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barcharts_html.files_path && - cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/raw_data $phylum_genus_dist_barcharts_html.files_path && - cp RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/bar_charts.html "${phylum_genus_dist_barcharts_html}" && - - ## Beta diversity weighted 2d plots - mkdir $beta_div_even_weighted_2d_plots.files_path && - cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/* $beta_div_even_weighted_2d_plots.files_path && - cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_weighted_2d_plots}" && - - ## Beta diversity unweighted 2d plots - mkdir $beta_div_even_unweighted_2d_plots.files_path && - cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/* $beta_div_even_unweighted_2d_plots.files_path && - cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_unweighted_2d_plots}" && - - ## Alpha diversity rarefaction plots - mkdir $alpha_div_rarefaction_plots.files_path && - cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/rarefaction_plots.html $alpha_div_rarefaction_plots && - cp -r RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/average_plots $alpha_div_rarefaction_plots.files_path && - - ## DADA2 error rate plots - #if str($pipeline_name) == "DADA2" - mkdir $dada2_error_rate_plots.files_path && - cp DADA2_OTU_tables/Error_rate_plots/error_rate_plots.html $dada2_error_rate_plots && - cp -r DADA2_OTU_tables/Error_rate_plots/*.pdf $dada2_error_rate_plots.files_path && - #end if - - ## Categories data - #if str($categories_file_in) != 'None' - ## Alpha diversity boxplots - mkdir $alpha_div_boxplots.files_path && - cp alpha_diversity_boxplots.html "$alpha_div_boxplots" && - cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf $alpha_div_boxplots.files_path && - #end if - - ## Pipeline outputs (log files etc) - mkdir $log_files.files_path && - cp Amplicon_analysis_pipeline.log $log_files.files_path && - cp pipeline.log $log_files.files_path && - cp Pipeline_outputs.txt $log_files.files_path && - cp Metatable_log/Metatable.html $log_files.files_path && - cp pipeline_outputs.html "$log_files" - ]]></command> - <inputs> - <param name="title" type="text" value="test" size="25" - label="Title" help="Optional text that will be added to the output dataset names" /> - <param type="data" name="metatable_file_in" format="tabular" - label="Input Metatable.txt file" /> - <param type="data" name="categories_file_in" format="txt" - label="Input Categories.txt file" optional="true" - help="(optional)" /> - <conditional name="input_type"> - <param name="pairs_or_collection" type="select" - label="Input FASTQ type"> - <option value="pairs_of_files">Pairs of datasets</option> - <option value="collection" selected="true">Dataset pairs in a collection</option> - </param> - <when value="collection"> - <param name="fastq_collection" type="data_collection" - format="fastqsanger,fastq" collection_type="list:paired" - label="Collection of FASTQ forward and reverse (R1/R2) pairs" - help="Each FASTQ pair will be treated as one sample; the name of each sample will be taken from the first column of the Metatable file " /> - </when> - <when value="pairs_of_files"> - <repeat name="fastq_pairs" title="Input fastq pairs" min="1"> - <param type="text" name="name" value="" - label="Final name for FASTQ pair" /> - <param type="data" name="fastq_r1" format="fastqsanger,fastq" - label="FASTQ with forward reads (R1)" /> - <param type="data" name="fastq_r2" format="fastqsanger,fastq" - label="FASTQ with reverse reads (R2)" /> - </repeat> - </when> - </conditional> - <param type="text" name="forward_pcr_primer" value="" - label="Forward PCR primer sequence" - help="Optional; must not include barcode or adapter sequence (-g)" /> - <param type="text" name="reverse_pcr_primer" value="" - label="Reverse PCR primer sequence" - help="Optional; must not include barcode or adapter sequence (-G)" /> - <param type="integer" name="trimming_threshold" value="20" - label="Threshold quality below which read will be trimmed" - help="Phred score; default is 20 (-q)" /> - <param type="integer" name="minimum_overlap" value="10" - label="Minimum overlap in bp between forward and reverse reads" - help="Default is 10 (-O)" /> - <param type="integer" name="minimum_length" value="200" - label="Minimum length in bp to keep sequence after overlapping" - help="Default is 200 (-L)" /> - <param type="integer" name="sliding_window_length" value="10" - label="Minimum length in bp to retain a read after trimming" - help="Supplied to Sickle; default is 10 (-l)" /> - <conditional name="pipeline"> - <param type="select" name="pipeline_name" - label="Pipeline to use for analysis"> - <option value="Vsearch" selected="true" >Vsearch</option> - <option value="DADA2">DADA2</option> - </param> - <when value="Vsearch"> - <param type="select" name="reference_database" - label="Reference database"> - <option value="" selected="true">GreenGenes</option> - <option value="-S">Silva</option> - <option value="-H">Human Oral Microbiome Database (HOMD)</option> - </param> - </when> - <when value="DADA2"> - </when> - </conditional> - </inputs> - <outputs> - <data format="tabular" name="metatable_mod" - label="${tool.name}:${title} Metatable_mod.txt" /> - <data format="tabular" name="read_counts_out" - label="${tool.name} (${pipeline.pipeline_name}):${title} read counts"> - <filter>pipeline['pipeline_name'] == 'Vsearch'</filter> - </data> - <data format="biom" name="tax_otu_table_biom_file" - label="${tool.name} (${pipeline.pipeline_name}):${title} tax OTU table (biom format)" /> - <data format="tabular" name="otus_tre_file" - label="${tool.name} (${pipeline.pipeline_name}):${title} otus.tre" /> - <data format="html" name="phylum_genus_dist_barcharts_html" - label="${tool.name} (${pipeline.pipeline_name}):${title} phylum genus dist barcharts HTML" /> - <data format="tabular" name="otus_count_file" - label="${tool.name} (${pipeline.pipeline_name}):${title} OTUs count file" /> - <data format="tabular" name="table_summary_file" - label="${tool.name} (${pipeline.pipeline_name}):${title} table summary file" /> - <data format="fasta" name="dereplicated_nonchimera_otus_fasta" - label="${tool.name} (${pipeline.pipeline_name}):${title} multiplexed linearized dereplicated mc2 repset nonchimeras OTUs FASTA" /> - <data format="html" name="fastqc_quality_boxplots_html" - label="${tool.name} (${pipeline.pipeline_name}):${title} FastQC per-base quality boxplots HTML" /> - <data format="pdf" name="heatmap_otu_table_pdf" - label="${tool.name} (${pipeline.pipeline_name}):${title} heatmap OTU table PDF" /> - <data format="html" name="beta_div_even_weighted_2d_plots" - label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity weighted 2D plots HTML" /> - <data format="html" name="beta_div_even_unweighted_2d_plots" - label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity unweighted 2D plots HTML" /> - <data format="html" name="alpha_div_rarefaction_plots" - label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity rarefaction plots HTML" /> - <data format="html" name="dada2_error_rate_plots" - label="${tool.name} (${pipeline.pipeline_name}):${title} DADA2 error rate plots"> - <filter>pipeline['pipeline_name'] == 'DADA2'</filter> - </data> - <data format="html" name="alpha_div_boxplots" - label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity boxplots"> - <filter>categories_file_in is not None</filter> - </data> - <data format="html" name="log_files" - label="${tool.name} (${pipeline.pipeline_name}):${title} log files" /> - </outputs> - <tests> - </tests> - <help><![CDATA[ - -What it does ------------- - -This pipeline has been designed for the analysis of 16S rRNA data from -Illumina Miseq (Casava >= 1.8) paired-end reads. - -Usage ------ - -1. Preparation of the mapping file and format of unique sample id -***************************************************************** - -Before using the amplicon analysis pipeline it would be necessary to -follow the steps as below to avoid analysis failures and ensure samples -are labelled appropriately. Sample names for the labelling are derived -from the fastq files names that are generated from the sequencing. The -labels will include everything between the beginning of the name and -the sample number (from C11 to S19 in Fig. 1) - -.. image:: Pipeline_description_Fig1.png - :height: 46 - :width: 382 - -**Figure 1** - -If analysing 16S data from multiple runs: - -The samples from different runs may have identical IDs. For example, -when sequencing the same samples twice, by chance, these could be at -the same position in both the runs. This would cause the fastq files -to have exactly the same IDs (Fig. 2). - -.. image:: Pipeline_description_Fig2.png - :height: 100 - :width: 463 - -**Figure 2** - -In case of identical sample IDs the pipeline will fail to run and -generate an error at the beginning of the analysis. - -To avoid having to change the file names, before uploading the files, -ensure that the samples IDs are not repeated. - -2. To upload the file -********************* - -Click on **Get Data/Upload File** from the Galaxy tool panel on the -left hand side. - -From the pop-up window, choose how to upload the file. The -**Choose local file** option can be used for files up to 4Gb. Fastq files -from Illumina MiSeq will rarely be bigger than 4Gb and this option is -recommended. - -After choosing the files click **Start** to begin the upload. The window can -now be closed and the files will be uploaded onto the Galaxy server. You -will see the progress on the ``HISTORY`` panel on the right -side of the screen. The colour will change from grey (queuing), to yellow -(uploading) and finally green (uploaded). - -Once all the files are uploaded, click on the operations on multiple -datasets icon and select the fastq files that need to be analysed. -Click on the tab **For all selected...** and on the option -**Build List of Dataset pairs** (Fig. 3). - -.. image:: Pipeline_description_Fig3.png - :height: 247 - :width: 586 - -**Figure 3** - -Change the filter parameter ``_1`` and ``_2`` to be ``_R1`` and ``_R2``. -The fastq files forward R1 and reverse R2 should now appear in the -corresponding columns. - -Select **Autopair**. This creates a collection of paired fastq files for -the forward and reverse reads for each sample. The name of the pairs will -be the ones used by the pipeline. You are free to change the names at this -point as long as they are the same used in the Metatable file -(see section 3). - -Name the collection and click on **create list**. This reduces the time -required to input the forward and reverse reads for each individual sample. - -3. Create the Metatable files -***************************** - -Metatable.txt -~~~~~~~~~~~~~ - -Click on the list of pairs you just created to see the name of the single -pairs. The name of the pairs will be the ones used by the pipeline, -therefore, these are the names that need to be used in the Metatable file. - -The Metatable file has to be in QIIME format. You can find a description -of it on QIIME website http://qiime.org/documentation/file_formats.html - -EXAMPLE:: - - #SampleID BarcodeSequence LinkerPrimerSequence Disease Gender Description - Mock-RUN1 TAAGGCGAGCGTAAGA PsA Male Control - Mock-RUN2 CGTACTAGGCGTAAGA PsA Male Control - Mock-RUN3 AGGCAGAAGCGTAAGA PsC Female Control - -Briefly: the column ``LinkerPrimerSequence`` is empty but it cannot be -deleted. The header is very important. ``#SampleID``, ``Barcode``, -``LinkerPrimerSequence`` and ``Description`` are mandatory. Between -``LinkerPrimerSequence`` and ``Description`` you can add as many columns -as you want. For every column a PCoA plot will be created (see -**Results** section). You can create this file in Excel and it will have -to be saved as ``Text(Tab delimited)``. - -During the analysis the Metatable.txt will be checked to ensure that the -file has the correct format. If necessary, this will be modified and will -be available as Metatable_corrected.txt in the history panel. If you are -going to use the metatable file for any other statistical analyses, -remember to use the ``Metatable_mod.txt`` one, otherwise the sample -names might not match! - -Categories.txt (optional) -~~~~~~~~~~~~~~~~~~~~~~~~~ - -This file is required if you want to get box plots for comparison of -alpha diversity indices (see **Results** section). The file is a list -(without header and IN ONE COLUMN) of categories present in the -Metatable.txt file. THE NAMES YOU ARE USING HAVE TO BE THE SAME AS THE -ONES USED IN THE METATABLE.TXT. You can create this file in Excel and -will have to be saved as ``Text(Tab delimited)``. - -EXAMPLE:: - - Disease - Gender - -Metatable and categories files can be uploaded using Get Data as done -with the fatsq files. - -4. Analysis -*********** - -Under **Amplicon_Analysis_Pipeline** - - * **Title** Name to distinguish between the runs. It will be shown at - the beginning of each output file name. - - * **Input Metatable.txt file** Select the Metatable.txt file related to - this analysis - - * **Input Categories.txt file (Optional)** Select the Categories.txt file - related to this analysis - - * **Input FASTQ type** select *Dataset pairs in a collection* and, then, - the collection of pairs you created earlier. - - * **Forward/Reverse PCR primer sequence** if the PCR primer sequences - have not been removed from the MiSeq during the fastq creation, they - have to be removed before the analysis. Insert the PCR primer sequence - in the corresponding field. DO NOT include any barcode or adapter - sequence. If the PCR primers have been already trimmed by the MiSeq, - and you include the sequence in this field, this would lead to an error. - Only include the sequences if still present in the fastq files. - - * **Threshold quality below which reads will be trimmed** Choose the - Phred score used by Sickle to trim the reads at the 3’ end. - - * **Minimum length to retain a read after trimming** If the read length - after trimming is shorter than a user defined length, the read, along - with the corresponding read pair, will be discarded. - - * **Minimum overlap in bp between forward and reverse reads** Choose the - minimum basepair overlap used by Pandaseq to assemble the reads. - Default is 10. - - * **Minimum length in bp to keep a sequence after overlapping** Choose the - minimum sequence length used by Pandaseq to keep a sequence after the - overlapping. This depends on the expected amplicon length. Default is - 380 (used for V3-V4 16S sequencing; expected length ~440bp) - - * **Pipeline to use for analysis** Choose the pipeline to use for OTU - clustering and chimera removal. The Galaxy tool supports the ``Vsearch`` - and ``DADA2`` pipelines. - - * **Reference database** Choose between ``GreenGenes``, ``Silva`` or - ``HOMD`` (Human Oral Microbiome Database) for taxa assignment. - -Click on **Execute** to start the analysis. - -5. Results -********** - -Results are entirely generated using QIIME scripts. The results will -appear in the History panel when the analysis is completed. - -The following outputs are captured: - - * **Vsearch_tax_OTU_table.biom|DADA2_tax_OTU_table.biom (biom format)** - The OTU table in BIOM format (http://biom-format.org/) - - * **otus.tre** Phylogenetic tree constructed using ``make_phylogeny.py`` - (fasttree) QIIME script (http://qiime.org/scripts/make_phylogeny.html) - - * **Phylum_genus_dist_barcharts_HTML** HTML file with bar charts at - Phylum, Genus and Species level - (http://qiime.org/scripts/summarize_taxa.html and - http://qiime.org/scripts/plot_taxa_summary.html) - - * **OTUs_count_file** Summary of OTU counts per sample - (http://biom-format.org/documentation/summarizing_biom_tables.html) - - * **Table_summary_file** Summary of sequences counts per sample - (http://biom-format.org/documentation/summarizing_biom_tables.html) - - * **multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta|seqs.fa** - Fasta file with OTU sequences (Vsearch|DADA2) - - * **Heatmap_PDF** OTU heatmap in PDF format - (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html ) - - * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML - format using weighted Unifrac distance measure. Samples are grouped - by the column names present in the Metatable file. The samples are - firstly rarefied to the minimum sequencing depth - (http://qiime.org/scripts/beta_diversity_through_plots.html ) - - * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML - format using Unweighted Unifrac distance measure. Samples are grouped - by the column names present in the Metatable file. The samples are - firstly rarefied to the minimum sequencing depth - (http://qiime.org/scripts/beta_diversity_through_plots.html ) - -Code availability ------------------ - -**Code is available at** https://github.com/MTutino/Amplicon_analysis - -Credits -------- - -Pipeline author: Mauro Tutino - -Galaxy tool: Peter Briggs - - ]]></help> - <citations> - <citation type="bibtex"> - @misc{githubAmplicon_analysis, - author = {Tutino, Mauro}, - year = {2017}, - title = {Amplicon Analysis Pipeline}, - publisher = {GitHub}, - journal = {GitHub repository}, - url = {https://github.com/MTutino/Amplicon_analysis}, -}</citation> - </citations> -</tool>
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,394 +0,0 @@ -#!/bin/sh -e -# -# Prototype script to setup a conda environment with the -# dependencies needed for the Amplicon_analysis_pipeline -# script -# -# Handle command line -usage() -{ - echo "Usage: $(basename $0) [DIR]" - echo "" - echo "Installs the Amplicon_analysis_pipeline package plus" - echo "dependencies in directory DIR (or current directory " - echo "if DIR not supplied)" -} -if [ ! -z "$1" ] ; then - # Check if help was requested - case "$1" in - --help|-h) - usage - exit 0 - ;; - esac - # Assume it's the installation directory - cd $1 -fi -# Versions -PIPELINE_VERSION=1.3.5 -CONDA_REQUIRED_VERSION=4.6.14 -RDP_CLASSIFIER_VERSION=2.2 -# Directories -TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION} -BIN_DIR=${TOP_DIR}/bin -CONDA_DIR=${TOP_DIR}/conda -CONDA_BIN=${CONDA_DIR}/bin -CONDA_LIB=${CONDA_DIR}/lib -CONDA=${CONDA_BIN}/conda -ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}" -ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME -# -# Functions -# -# Report failure and terminate script -fail() -{ - echo "" - echo ERROR $@ >&2 - echo "" - echo "$(basename $0): installation failed" - exit 1 -} -# -# Rewrite the shebangs in the installed conda scripts -# to remove the full path to conda 'bin' directory -rewrite_conda_shebangs() -{ - pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g" - find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \; -} -# -# Reset conda version if required -reset_conda_version() -{ - CONDA_VERSION="$(${CONDA_BIN}/conda -V 2>&1 | head -n 1 | cut -d' ' -f2)" - echo conda version: ${CONDA_VERSION} - if [ "${CONDA_VERSION}" != "${CONDA_REQUIRED_VERSION}" ] ; then - echo "Resetting conda to last known working version $CONDA_REQUIRED_VERSION" - ${CONDA_BIN}/conda config --set allow_conda_downgrades true - ${CONDA_BIN}/conda install -y conda=${CONDA_REQUIRED_VERSION} - else - echo "conda version ok" - fi -} -# -# Install conda -install_conda() -{ - echo "++++++++++++++++" - echo "Installing conda" - echo "++++++++++++++++" - if [ -e ${CONDA_DIR} ] ; then - echo "*** $CONDA_DIR already exists ***" >&2 - return - fi - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh - bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR} - echo Installed conda in ${CONDA_DIR} - # Reset the conda version to a known working version - # (to avoid problems observed with e.g. conda 4.7.10) - echo "" - reset_conda_version - # Update the installation files - # This is to avoid problems when the length the installation - # directory path exceeds the limit for the shebang statement - # in the conda files - echo "" - echo -n "Rewriting conda shebangs..." - rewrite_conda_shebangs - echo "ok" - echo -n "Adding conda bin to PATH..." - PATH=${CONDA_BIN}:$PATH - echo "ok" - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# Create conda environment -install_conda_packages() -{ - echo "+++++++++++++++++++++++++" - echo "Installing conda packages" - echo "+++++++++++++++++++++++++" - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - cat >environment.yml <<EOF -name: ${ENV_NAME} -channels: - - defaults - - conda-forge - - bioconda -dependencies: - - python=2.7 - - cutadapt=1.8 - - sickle-trim=1.33 - - bioawk=1.0 - - pandaseq=2.8.1 - - spades=3.10.1 - - fastqc=0.11.3 - - qiime=1.9.1 - - blast-legacy=2.2.26 - - fasta-splitter=0.2.6 - - rdp_classifier=$RDP_CLASSIFIER_VERSION - - vsearch=2.10.4 - - r=3.5.1 - - r-tidyverse=1.2.1 - - bioconductor-dada2=1.8 - - bioconductor-biomformat=1.8.0 -EOF - ${CONDA} env create --name "${ENV_NAME}" -f environment.yml - echo Created conda environment in ${ENV_DIR} - cd $cwd - rm -rf $wd/* - rmdir $wd - # - # Patch qiime 1.9.1 tools to switch deprecated 'axisbg' - # matplotlib property to 'facecolor': - # https://matplotlib.org/api/prev_api_changes/api_changes_2.0.0.html - echo "" - for exe in make_2d_plots.py plot_taxa_summary.py ; do - echo -n "Patching ${exe}..." - find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/axisbg=/facecolor=/g' {} \; - echo "done" - done - # - # Patch qiime 1.9.1 tools to switch deprecated 'set_axis_bgcolor' - # method call to 'set_facecolor': - # https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_axis_bgcolor.html - for exe in make_rarefaction_plots.py ; do - echo -n "Patching ${exe}..." - find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/set_axis_bgcolor/set_facecolor/g' {} \; - echo "done" - done -} -# -# Install all the non-conda dependencies in a single -# function (invokes separate functions for each package) -install_non_conda_packages() -{ - echo "+++++++++++++++++++++++++++++" - echo "Installing non-conda packages" - echo "+++++++++++++++++++++++++++++" - # Temporary working directory - local wd=$(mktemp -d) - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - # Amplicon analysis pipeline - echo -n "Installing Amplicon_analysis_pipeline..." - if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then - echo "already installed" - else - install_amplicon_analysis_pipeline - echo "ok" - fi - # ChimeraSlayer - echo -n "Installing ChimeraSlayer..." - if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then - echo "already installed" - else - install_chimeraslayer - echo "ok" - fi - # Uclust - # This no longer seems to be available for download from - # drive5.com so don't download - echo "WARNING uclust not available: skipping installation" -} -# -# Amplicon analyis pipeline -install_amplicon_analysis_pipeline() -{ - local wd=$(mktemp -d) - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://github.com/MTutino/Amplicon_analysis/archive/${PIPELINE_VERSION}.tar.gz - tar zxf ${PIPELINE_VERSION}.tar.gz - cd Amplicon_analysis-${PIPELINE_VERSION} - INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION} - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline - for f in *.sh *.R ; do - /bin/cp $f $INSTALL_DIR - done - /bin/cp -r uc2otutab $INSTALL_DIR - mkdir -p ${BIN_DIR} - cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF -#!/usr/bin/env bash -# -# Point to Qiime config -export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config -# Set up the RDP jar file -export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar -# Set the Matplotlib backend -export MPLBACKEND="agg" -# Put the scripts onto the PATH -export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH -# Activate the conda environment -export PATH=${CONDA_BIN}:\$PATH -source ${CONDA_BIN}/activate ${ENV_NAME} -# Execute the driver script with the supplied arguments -$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@ -exit \$? -EOF - chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh - cat >${BIN_DIR}/install_reference_data.sh <<EOF -#!/usr/bin/env bash -e -# -function usage() { - echo "Usage: \$(basename \$0) DIR" -} -if [ -z "\$1" ] ; then - usage - exit 0 -elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then - usage - echo "" - echo "Install reference data into DIR" - exit 0 -fi -echo "==========================================" -echo "Installing Amplicon analysis pipeline data" -echo "==========================================" -if [ ! -e "\$1" ] ; then - echo "Making directory \$1" - mkdir -p \$1 -fi -cd \$1 -DATA_DIR=\$(pwd) -echo "Installing reference data under \$DATA_DIR" -$INSTALL_DIR/References.sh -echo "" -echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh" -echo "to use the reference data from this directory" -echo "" -echo "\$(basename \$0): finished" -EOF - chmod 0755 ${BIN_DIR}/install_reference_data.sh - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# ChimeraSlayer -install_chimeraslayer() -{ - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz - tar zxf microbiomeutil_2010-04-29.tar.gz - cd microbiomeutil_2010-04-29 - INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29 - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer - /bin/cp -r ChimeraSlayer $INSTALL_DIR - cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF -#!/usr/bin/env bash -export PATH=$INSTALL_DIR:\$PATH -$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@ -EOF - chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl - chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# uclust required for QIIME/pyNAST -# License only allows this version to be used with those two packages -# See: http://drive5.com/uclust/downloads1_2_22q.html -install_uclust() -{ - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64 - INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22 - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust - /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust - chmod 0755 ${INSTALL_DIR}/uclust - ln -s ${INSTALL_DIR}/uclust ${BIN_DIR} - cd $cwd - rm -rf $wd/* - rmdir $wd -} -setup_pipeline_environment() -{ - echo "+++++++++++++++++++++++++++++++" - echo "Setting up pipeline environment" - echo "+++++++++++++++++++++++++++++++" - # fasta_splitter.pl - echo -n "Setting up fasta_splitter.pl..." - if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then - echo "already exists" - elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then - echo "failed" - fail "fasta-splitter.pl not found" - else - ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl - echo "ok" - fi - # rdp_classifier.jar - local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar - echo -n "Setting up rdp_classifier.jar..." - if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then - echo "already exists" - elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then - echo "failed" - fail "rdp_classifier.jar not found" - else - mkdir -p ${TOP_DIR}/share/rdp_classifier - ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} - echo "ok" - fi - # qiime_config - echo -n "Setting up qiime_config..." - if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then - echo "already exists" - else - mkdir -p ${TOP_DIR}/qiime - cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config -qiime_scripts_dir ${ENV_DIR}/bin -EOF-qiime-config - echo "ok" - fi -} -# -# Top level script does the installation -echo "=======================================" -echo "Amplicon_analysis_pipeline installation" -echo "=======================================" -echo "Installing into ${TOP_DIR}" -if [ -e ${TOP_DIR} ] ; then - fail "Directory already exists" -fi -mkdir -p ${TOP_DIR} -install_conda -install_conda_packages -install_non_conda_packages -setup_pipeline_environment -echo "====================================" -echo "Amplicon_analysis_pipeline installed" -echo "====================================" -echo "" -echo "Install reference data using:" -echo "" -echo "\$ ${BIN_DIR}/install_reference_data.sh DIR" -echo "" -echo "Run pipeline scripts using:" -echo "" -echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..." -echo "" -echo "(or add ${BIN_DIR} to your PATH)" -echo "" -echo "$(basename $0): finished" -## -#
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis.sh Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,425 +0,0 @@ -#!/bin/sh -e -# -# Prototype script to setup a conda environment with the -# dependencies needed for the Amplicon_analysis_pipeline -# script -# -# Handle command line -usage() -{ - echo "Usage: $(basename $0) [DIR]" - echo "" - echo "Installs the Amplicon_analysis_pipeline package plus" - echo "dependencies in directory DIR (or current directory " - echo "if DIR not supplied)" -} -if [ ! -z "$1" ] ; then - # Check if help was requested - case "$1" in - --help|-h) - usage - exit 0 - ;; - esac - # Assume it's the installation directory - cd $1 -fi -# Versions -PIPELINE_VERSION=1.2.3 -RDP_CLASSIFIER_VERSION=2.2 -# Directories -TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION} -BIN_DIR=${TOP_DIR}/bin -CONDA_DIR=${TOP_DIR}/conda -CONDA_BIN=${CONDA_DIR}/bin -CONDA_LIB=${CONDA_DIR}/lib -CONDA=${CONDA_BIN}/conda -ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}" -ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME -# -# Functions -# -# Report failure and terminate script -fail() -{ - echo "" - echo ERROR $@ >&2 - echo "" - echo "$(basename $0): installation failed" - exit 1 -} -# -# Rewrite the shebangs in the installed conda scripts -# to remove the full path to conda 'bin' directory -rewrite_conda_shebangs() -{ - pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g" - find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \; -} -# -# Install conda -install_conda() -{ - echo "++++++++++++++++" - echo "Installing conda" - echo "++++++++++++++++" - if [ -e ${CONDA_DIR} ] ; then - echo "*** $CONDA_DIR already exists ***" >&2 - return - fi - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh - bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR} - echo Installed conda in ${CONDA_DIR} - # Update the installation files - # This is to avoid problems when the length the installation - # directory path exceeds the limit for the shebang statement - # in the conda files - echo "" - echo -n "Rewriting conda shebangs..." - rewrite_conda_shebangs - echo "ok" - echo -n "Adding conda bin to PATH..." - PATH=${CONDA_BIN}:$PATH - echo "ok" - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# Create conda environment -install_conda_packages() -{ - echo "+++++++++++++++++++++++++" - echo "Installing conda packages" - echo "+++++++++++++++++++++++++" - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - cat >environment.yml <<EOF -name: ${ENV_NAME} -channels: - - defaults - - conda-forge - - bioconda -dependencies: - - python=2.7 - - cutadapt=1.11 - - sickle-trim=1.33 - - bioawk=1.0 - - pandaseq=2.8.1 - - spades=3.5.0 - - fastqc=0.11.3 - - qiime=1.8.0 - - blast-legacy=2.2.26 - - fasta-splitter=0.2.4 - - rdp_classifier=$RDP_CLASSIFIER_VERSION - - vsearch=1.1.3 - # Need to explicitly specify libgfortran - # version (otherwise get version incompatible - # with numpy=1.7.1) - - libgfortran=1.0 - # Compilers needed to build R - - gcc_linux-64 - - gxx_linux-64 - - gfortran_linux-64 -EOF - ${CONDA} env create --name "${ENV_NAME}" -f environment.yml - echo Created conda environment in ${ENV_DIR} - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# Install all the non-conda dependencies in a single -# function (invokes separate functions for each package) -install_non_conda_packages() -{ - echo "+++++++++++++++++++++++++++++" - echo "Installing non-conda packages" - echo "+++++++++++++++++++++++++++++" - # Temporary working directory - local wd=$(mktemp -d) - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - # Amplicon analysis pipeline - echo -n "Installing Amplicon_analysis_pipeline..." - if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then - echo "already installed" - else - install_amplicon_analysis_pipeline - echo "ok" - fi - # ChimeraSlayer - echo -n "Installing ChimeraSlayer..." - if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then - echo "already installed" - else - install_chimeraslayer - echo "ok" - fi - # Uclust - echo -n "Installing uclust for QIIME/pyNAST..." - if [ -e ${BIN_DIR}/uclust ] ; then - echo "already installed" - else - install_uclust - echo "ok" - fi - # R 3.2.1" - echo -n "Checking for R 3.2.1..." - if [ -e ${BIN_DIR}/R ] ; then - echo "R already installed" - else - echo "not found" - install_R_3_2_1 - fi -} -# -# Amplicon analyis pipeline -install_amplicon_analysis_pipeline() -{ - local wd=$(mktemp -d) - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${PIPELINE_VERSION}.tar.gz - tar zxf v${PIPELINE_VERSION}.tar.gz - cd Amplicon_analysis-${PIPELINE_VERSION} - INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION} - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline - for f in *.sh ; do - /bin/cp $f $INSTALL_DIR - done - /bin/cp -r uc2otutab $INSTALL_DIR - mkdir -p ${BIN_DIR} - cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF -#!/usr/bin/env bash -# -# Point to Qiime config -export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config -# Set up the RDP jar file -export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar -# Put the scripts onto the PATH -export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH -# Activate the conda environment -export PATH=${CONDA_BIN}:\$PATH -source ${CONDA_BIN}/activate ${ENV_NAME} -# Execute the driver script with the supplied arguments -$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@ -exit \$? -EOF - chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh - cat >${BIN_DIR}/install_reference_data.sh <<EOF -#!/usr/bin/env bash -e -# -function usage() { - echo "Usage: \$(basename \$0) DIR" -} -if [ -z "\$1" ] ; then - usage - exit 0 -elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then - usage - echo "" - echo "Install reference data into DIR" - exit 0 -fi -echo "==========================================" -echo "Installing Amplicon analysis pipeline data" -echo "==========================================" -if [ ! -e "\$1" ] ; then - echo "Making directory \$1" - mkdir -p \$1 -fi -cd \$1 -DATA_DIR=\$(pwd) -echo "Installing reference data under \$DATA_DIR" -$INSTALL_DIR/References.sh -echo "" -echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh" -echo "to use the reference data from this directory" -echo "" -echo "\$(basename \$0): finished" -EOF - chmod 0755 ${BIN_DIR}/install_reference_data.sh - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# ChimeraSlayer -install_chimeraslayer() -{ - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz - tar zxf microbiomeutil_2010-04-29.tar.gz - cd microbiomeutil_2010-04-29 - INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29 - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer - /bin/cp -r ChimeraSlayer $INSTALL_DIR - cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF -#!/usr/bin/env bash -export PATH=$INSTALL_DIR:\$PATH -$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@ -EOF - chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl - chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# uclust required for QIIME/pyNAST -# License only allows this version to be used with those two packages -# See: http://drive5.com/uclust/downloads1_2_22q.html -install_uclust() -{ - local wd=$(mktemp -d) - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64 - INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22 - mkdir -p $INSTALL_DIR - ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust - /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust - chmod 0755 ${INSTALL_DIR}/uclust - ln -s ${INSTALL_DIR}/uclust ${BIN_DIR} - cd $cwd - rm -rf $wd/* - rmdir $wd -} -# -# R 3.2.1 -# Can't use version from conda due to dependency conflicts -install_R_3_2_1() -{ - . ${CONDA_BIN}/activate ${ENV_NAME} - local cwd=$(pwd) - local wd=$(mktemp -d) - cd $wd - echo -n "Fetching R 3.2.1 source code..." - wget -q http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz - echo "ok" - INSTALL_DIR=${TOP_DIR} - mkdir -p $INSTALL_DIR - echo -n "Unpacking source code..." - tar xzf R-3.2.1.tar.gz >INSTALL.log 2>&1 - echo "ok" - cd R-3.2.1 - echo -n "Running configure..." - ./configure --prefix=$INSTALL_DIR --with-x=no --with-readline=no >>INSTALL.log 2>&1 - echo "ok" - echo -n "Running make..." - make >>INSTALL.log 2>&1 - echo "ok" - echo -n "Running make install..." - make install >>INSTALL.log 2>&1 - echo "ok" - cd $cwd - rm -rf $wd/* - rmdir $wd - . ${CONDA_BIN}/deactivate -} -setup_pipeline_environment() -{ - echo "+++++++++++++++++++++++++++++++" - echo "Setting up pipeline environment" - echo "+++++++++++++++++++++++++++++++" - # vsearch113 - echo -n "Setting up vsearch113..." - if [ -e ${BIN_DIR}/vsearch113 ] ; then - echo "already exists" - elif [ ! -e ${ENV_DIR}/bin/vsearch ] ; then - echo "failed" - fail "vsearch not found" - else - ln -s ${ENV_DIR}/bin/vsearch ${BIN_DIR}/vsearch113 - echo "ok" - fi - # fasta_splitter.pl - echo -n "Setting up fasta_splitter.pl..." - if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then - echo "already exists" - elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then - echo "failed" - fail "fasta-splitter.pl not found" - else - ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl - echo "ok" - fi - # rdp_classifier.jar - local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar - echo -n "Setting up rdp_classifier.jar..." - if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then - echo "already exists" - elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then - echo "failed" - fail "rdp_classifier.jar not found" - else - mkdir -p ${TOP_DIR}/share/rdp_classifier - ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} - echo "ok" - fi - # qiime_config - echo -n "Setting up qiime_config..." - if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then - echo "already exists" - else - mkdir -p ${TOP_DIR}/qiime - cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config -qiime_scripts_dir ${ENV_DIR}/bin -EOF-qiime-config - echo "ok" - fi -} -# -# Remove the compilers from the conda environment -# Not sure if this step is necessary -remove_conda_compilers() -{ - echo "+++++++++++++++++++++++++++++++++++++++++" - echo "Removing compilers from conda environment" - echo "+++++++++++++++++++++++++++++++++++++++++" - ${CONDA} remove -y -n ${ENV_NAME} gcc_linux-64 gxx_linux-64 gfortran_linux-64 -} -# -# Top level script does the installation -echo "=======================================" -echo "Amplicon_analysis_pipeline installation" -echo "=======================================" -echo "Installing into ${TOP_DIR}" -if [ -e ${TOP_DIR} ] ; then - fail "Directory already exists" -fi -mkdir -p ${TOP_DIR} -install_conda -install_conda_packages -install_non_conda_packages -setup_pipeline_environment -remove_conda_compilers -echo "====================================" -echo "Amplicon_analysis_pipeline installed" -echo "====================================" -echo "" -echo "Install reference data using:" -echo "" -echo "\$ ${BIN_DIR}/install_reference_data.sh DIR" -echo "" -echo "Run pipeline scripts using:" -echo "" -echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..." -echo "" -echo "(or add ${BIN_DIR} to your PATH)" -echo "" -echo "$(basename $0): finished" -## -#
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig1.png has changed
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig2.png has changed
Binary file Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/static/images/Pipeline_description_Fig3.png has changed
--- a/Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/tool_dependencies.xml Thu Dec 05 11:44:03 2019 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,16 +0,0 @@ -<?xml version="1.0"?> -<tool_dependency> - <package name="amplicon_analysis_pipeline" version="1.3.5"> - <install version="1.0"> - <actions> - <action type="download_file">https://raw.githubusercontent.com/pjbriggs/Amplicon_analysis-galaxy/update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh</action> - <action type="shell_command"> - sh ./install_amplicon_analysis.sh $INSTALL_DIR - </action> - <action type="set_environment"> - <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/Amplicon_analysis-1.3.5/bin</environment_variable> - </action> - </actions> - </install> - </package> -</tool_dependency>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,213 @@ +Amplicon_analysis-galaxy +======================== + +A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline +script at https://github.com/MTutino/Amplicon_analysis + +The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq +(Casava >= 1.8) and performs the following operations: + + * QC and clean up of input data + * Removal of singletons and chimeras and building of OTU table + and phylogenetic tree + * Beta and alpha diversity of analysis + +Usage documentation +=================== + +Usage of the tool (including required inputs) is documented within +the ``help`` section of the tool XML. + +Installing the tool in a Galaxy instance +======================================== + +The following sections describe how to install the tool files, +dependencies and reference data, and how to configure the Galaxy +instance to detect the dependencies and reference data correctly +at run time. + +1. Install the tool from the toolshed +------------------------------------- + +The core tool is hosted on the Galaxy toolshed, so it can be installed +directly from there (this is the recommended route): + + * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ + +Alternatively it can be installed manually; in this case there are two +files to install: + + * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) + * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) + +Put these in a directory that is visible to Galaxy (e.g. a +``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` +file to tell Galaxy to offer the tool by adding the line e.g.:: + + <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> + +2. Install the reference data +----------------------------- + +The script ``References.sh`` from the pipeline package at +https://github.com/MTutino/Amplicon_analysis can be run to install +the reference data, for example:: + + cd /path/to/pipeline/data + wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh + /bin/bash ./References.sh + +will install the data in ``/path/to/pipeline/data``. + +**NB** The final amount of data downloaded and uncompressed will be +around 9GB. + +3. Configure reference data location in Galaxy +---------------------------------------------- + +The final step is to make your Galaxy installation aware of the +location of the reference data, so it can locate them both when the +tool is run. + +The tool locates the reference data via an environment variable called +``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent +directory where the reference data has been installed. + +There are various ways to do this, depending on how your Galaxy +installation is configured: + + * **For local instances:** add a line to set it in the + ``config/local_env.sh`` file of your Galaxy installation (you + may need to create a new empty file first), e.g.:: + + export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data + + * **For production instances:** set the value in the ``job_conf.xml`` + configuration file, e.g.:: + + <destination id="amplicon_analysis"> + <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> + </destination> + + and then specify that the pipeline tool uses this destination:: + + <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> + + (For more about job destinations see the Galaxy documentation at + https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations) + +4. Enable rendering of HTML outputs from pipeline +------------------------------------------------- + +To ensure that HTML outputs are displayed correctly in Galaxy +(for example the Vsearch OTU table heatmaps), Galaxy needs to be +configured not to sanitize the outputs from the ``Amplicon_analysis`` +tool. + +Either: + + * **For local instances:** set ``sanitize_all_html = False`` in + ``config/galaxy.ini`` (nb don't do this on production servers or + public instances!); or + + * **For production instances:** add the ``Amplicon_analysis`` tool + to the display whitelist in the Galaxy instance: + + - Set ``sanitize_whitelist_file = config/whitelist.txt`` in + ``config/galaxy.ini`` and restart Galaxy; + - Go to ``Admin>Manage Display Whitelist``, check the box for + ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' + search function to help locate it) and click on + ``Submit new whitelist`` to update the settings. + +Additional details +================== + +Some other things to be aware of: + + * Note that using the Silva database requires a minimum of 18Gb RAM + +Known problems +============== + + * Only the ``VSEARCH`` pipeline in Mauro's script is currently + available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` + pipelines have yet to be implemented. + * The images in the tool help section are not visible if the + tool has been installed locally, or if it has been installed in + a Galaxy instance which is served from a subdirectory. + + These are both problems with Galaxy and not the tool, see + https://github.com/galaxyproject/galaxy/issues/4490 and + https://github.com/galaxyproject/galaxy/issues/1676 + +Appendix: installing the dependencies manually +============================================== + +If the tool is installed from the Galaxy toolshed (recommended) then +the dependencies should be installed automatically and this step can +be skipped. + +Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used +to fetch and install the dependencies locally, for example:: + + install_amplicon_analysis.sh /path/to/local_tool_dependencies + +(This is the same script as is used to install dependencies from the +toolshed.) This can take some time to complete, and when completed will +have created a directory called ``Amplicon_analysis-1.2.3`` containing +the dependencies under the specified top level directory. + +**NB** The installed dependencies will occupy around 2.6G of disk +space. + +You will need to make sure that the ``bin`` subdirectory of this +directory is on Galaxy's ``PATH`` at runtime, for the tool to be able +to access the dependencies - for example by adding a line to the +``local_env.sh`` file like:: + + export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH + +History +======= + +========== ====================================================================== +Version Changes +---------- ---------------------------------------------------------------------- +1.3.5.0 Updated to Amplicon_Analysis_Pipeline version 1.3.5. +1.2.3.0 Updated to Amplicon_Analysis_Pipeline version 1.2.3; install + dependencies via tool_dependencies.xml. +1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes + jackknifed analysis which is not captured by Galaxy tool) +1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds + option to use the Human Oral Microbiome Database v15.1, and + updates SILVA database to v123) +1.1.0 First official version on Galaxy toolshed. +1.0.6 Expand inline documentation to provide detailed usage guidance. +1.0.5 Updates including: + + - Capture read counts from quality control as new output dataset + - Capture FastQC per-base quality boxplots for each sample as + new output dataset + - Add support for -l option (sliding window length for trimming) + - Default for -L set to "200" +1.0.4 Various updates: + + - Additional outputs are captured when a "Categories" file is + supplied (alpha diversity rarefaction curves and boxplots) + - Sample names derived from Fastqs in a collection of pairs + are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) + - Input Fastqs can now be of more general ``fastq`` type + - Log file outputs are captured in new output dataset + - User can specify a "title" for the job which is copied into + the dataset names (to distinguish outputs from different runs) + - Improved detection and reporting of problems with input + Metatable +1.0.3 Take the sample names from the collection dataset names when + using collection as input (this is now the default input mode); + collect additional output dataset; disable ``usearch``-based + pipelines (i.e. ``UPARSE`` and ``QIIME``). +1.0.2 Enable support for FASTQs supplied via dataset collections and + fix some broken output datasets. +1.0.1 Initial version +========== ======================================================================
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/amplicon_analysis_pipeline.py Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,370 @@ +#!/usr/bin/env python +# +# Wrapper script to run Amplicon_analysis_pipeline.sh +# from Galaxy tool + +import sys +import os +import argparse +import subprocess +import glob + +class PipelineCmd(object): + def __init__(self,cmd): + self.cmd = [str(cmd)] + def add_args(self,*args): + for arg in args: + self.cmd.append(str(arg)) + def __repr__(self): + return ' '.join([str(arg) for arg in self.cmd]) + +def ahref(target,name=None,type=None): + if name is None: + name = os.path.basename(target) + ahref = "<a href='%s'" % target + if type is not None: + ahref += " type='%s'" % type + ahref += ">%s</a>" % name + return ahref + +def check_errors(): + # Errors in Amplicon_analysis_pipeline.log + with open('Amplicon_analysis_pipeline.log','r') as pipeline_log: + log = pipeline_log.read() + if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log: + print_error("""*** Sample IDs don't match dataset names *** + +The sample IDs (first column of the Metatable file) don't match the +supplied sample names for the input Fastq pairs. +""") + # Errors in pipeline output + with open('pipeline.log','r') as pipeline_log: + log = pipeline_log.read() + if "Errors and/or warnings detected in mapping file" in log: + with open("Metatable_log/Metatable.log","r") as metatable_log: + # Echo the Metatable log file to the tool log + print_error("""*** Error in Metatable mapping file *** + +%s""" % metatable_log.read()) + elif "No header line was found in mapping file" in log: + # Report error to the tool log + print_error("""*** No header in Metatable mapping file *** + +Check you've specified the correct file as the input Metatable""") + +def print_error(message): + width = max([len(line) for line in message.split('\n')]) + 4 + sys.stderr.write("\n%s\n" % ('*'*width)) + for line in message.split('\n'): + sys.stderr.write("* %s%s *\n" % (line,' '*(width-len(line)-4))) + sys.stderr.write("%s\n\n" % ('*'*width)) + +def clean_up_name(sample): + # Remove extensions and trailing "_L[0-9]+_001" from + # Fastq pair names + sample_name = '.'.join(sample.split('.')[:1]) + split_name = sample_name.split('_') + if split_name[-1] == "001": + split_name = split_name[:-1] + if split_name[-1].startswith('L'): + try: + int(split_name[-1][1:]) + split_name = split_name[:-1] + except ValueError: + pass + return '_'.join(split_name) + +def list_outputs(filen=None): + # List the output directory contents + # If filen is specified then will be the filename to + # write to, otherwise write to stdout + if filen is not None: + fp = open(filen,'w') + else: + fp = sys.stdout + results_dir = os.path.abspath("RESULTS") + fp.write("Listing contents of output dir %s:\n" % results_dir) + ix = 0 + for d,dirs,files in os.walk(results_dir): + ix += 1 + fp.write("-- %d: %s\n" % (ix, + os.path.relpath(d,results_dir))) + for f in files: + ix += 1 + fp.write("---- %d: %s\n" % (ix, + os.path.relpath(f,results_dir))) + # Close output file + if filen is not None: + fp.close() + +if __name__ == "__main__": + # Command line + print "Amplicon analysis: starting" + p = argparse.ArgumentParser() + p.add_argument("metatable", + metavar="METATABLE_FILE", + help="Metatable.txt file") + p.add_argument("fastq_pairs", + metavar="SAMPLE_NAME FQ_R1 FQ_R2", + nargs="+", + default=list(), + help="Triplets of SAMPLE_NAME followed by " + "a R1/R2 FASTQ file pair") + p.add_argument("-g",dest="forward_pcr_primer") + p.add_argument("-G",dest="reverse_pcr_primer") + p.add_argument("-q",dest="trimming_threshold") + p.add_argument("-O",dest="minimum_overlap") + p.add_argument("-L",dest="minimum_length") + p.add_argument("-l",dest="sliding_window_length") + p.add_argument("-P",dest="pipeline", + choices=["Vsearch","DADA2"], + type=str, + default="Vsearch") + p.add_argument("-S",dest="use_silva",action="store_true") + p.add_argument("-H",dest="use_homd",action="store_true") + p.add_argument("-r",dest="reference_data_path") + p.add_argument("-c",dest="categories_file") + args = p.parse_args() + + # Build the environment for running the pipeline + print "Amplicon analysis: building the environment" + metatable_file = os.path.abspath(args.metatable) + os.symlink(metatable_file,"Metatable.txt") + print "-- made symlink to Metatable.txt" + + # Link to Categories.txt file (if provided) + if args.categories_file is not None: + categories_file = os.path.abspath(args.categories_file) + os.symlink(categories_file,"Categories.txt") + print "-- made symlink to Categories.txt" + + # Link to FASTQs and construct Final_name.txt file + sample_names = [] + print "-- making Final_name.txt" + with open("Final_name.txt",'w') as final_name: + fastqs = iter(args.fastq_pairs) + for sample_name,fqr1,fqr2 in zip(fastqs,fastqs,fastqs): + sample_name = clean_up_name(sample_name) + print " %s" % sample_name + r1 = "%s_R1_.fastq" % sample_name + r2 = "%s_R2_.fastq" % sample_name + os.symlink(fqr1,r1) + os.symlink(fqr2,r2) + final_name.write("%s\n" % '\t'.join((r1,sample_name))) + final_name.write("%s\n" % '\t'.join((r2,sample_name))) + sample_names.append(sample_name) + + # Reference database + if args.use_silva: + ref_database = "silva" + elif args.use_homd: + ref_database = "homd" + else: + ref_database = "gg" + + # Construct the pipeline command + print "Amplicon analysis: constructing pipeline command" + pipeline = PipelineCmd("Amplicon_analysis_pipeline.sh") + if args.forward_pcr_primer: + pipeline.add_args("-g",args.forward_pcr_primer) + if args.reverse_pcr_primer: + pipeline.add_args("-G",args.reverse_pcr_primer) + if args.trimming_threshold: + pipeline.add_args("-q",args.trimming_threshold) + if args.minimum_overlap: + pipeline.add_args("-O",args.minimum_overlap) + if args.minimum_length: + pipeline.add_args("-L",args.minimum_length) + if args.sliding_window_length: + pipeline.add_args("-l",args.sliding_window_length) + if args.reference_data_path: + pipeline.add_args("-r",args.reference_data_path) + pipeline.add_args("-P",args.pipeline) + if ref_database == "silva": + pipeline.add_args("-S") + elif ref_database == "homd": + pipeline.add_args("-H") + + # Echo the pipeline command to stdout + print "Running %s" % pipeline + + # Run the pipeline + with open("pipeline.log","w") as pipeline_out: + try: + subprocess.check_call(pipeline.cmd, + stdout=pipeline_out, + stderr=subprocess.STDOUT) + exit_code = 0 + print "Pipeline completed ok" + except subprocess.CalledProcessError as ex: + # Non-zero exit status + sys.stderr.write("Pipeline failed: exit code %s\n" % + ex.returncode) + exit_code = ex.returncode + except Exception as ex: + # Some other problem + sys.stderr.write("Unexpected error: %s\n" % str(ex)) + exit_code = 1 + + # Write out the list of outputs + outputs_file = "Pipeline_outputs.txt" + list_outputs(outputs_file) + + # Check for log file + log_file = "Amplicon_analysis_pipeline.log" + if os.path.exists(log_file): + print "Found log file: %s" % log_file + if exit_code == 0: + # Create an HTML file to link to log files etc + # NB the paths to the files should be correct once + # copied by Galaxy on job completion + with open("pipeline_outputs.html","w") as html_out: + html_out.write("""<html> +<head> +<title>Amplicon analysis pipeline: log files</title> +<head> +<body> +<h1>Amplicon analysis pipeline: log files</h1> +<ul> +""") + html_out.write( + "<li>%s</li>\n" % + ahref("Amplicon_analysis_pipeline.log", + type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("pipeline.log",type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("Pipeline_outputs.txt", + type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("Metatable.html")) + html_out.write("""<ul> +</body> +</html> +""") + else: + # Check for known error messages + check_errors() + # Write pipeline stdout to tool stderr + sys.stderr.write("\nOutput from pipeline:\n") + with open("pipeline.log",'r') as log: + sys.stderr.write("%s" % log.read()) + # Write log file contents to tool log + print "\nAmplicon_analysis_pipeline.log:" + with open(log_file,'r') as log: + print "%s" % log.read() + else: + sys.stderr.write("ERROR missing log file \"%s\"\n" % + log_file) + + # Handle FastQC boxplots + print "Amplicon analysis: collating per base quality boxplots" + with open("fastqc_quality_boxplots.html","w") as quality_boxplots: + # PHRED value for trimming + phred_score = 20 + if args.trimming_threshold is not None: + phred_score = args.trimming_threshold + # Write header for HTML output file + quality_boxplots.write("""<html> +<head> +<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title> +<head> +<body> +<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1> +""") + # Look for raw and trimmed FastQC output for each sample + for sample_name in sample_names: + fastqc_dir = os.path.join(sample_name,"FastQC") + quality_boxplots.write("<h2>%s</h2>" % sample_name) + for d in ("Raw","cutdapt_sickle/Q%s" % phred_score): + quality_boxplots.write("<h3>%s</h3>" % d) + fastqc_html_files = glob.glob( + os.path.join(fastqc_dir,d,"*_fastqc.html")) + if not fastqc_html_files: + quality_boxplots.write("<p>No FastQC outputs found</p>") + continue + # Pull out the per-base quality boxplots + for f in fastqc_html_files: + boxplot = None + with open(f) as fp: + for line in fp.read().split(">"): + try: + line.index("alt=\"Per base quality graph\"") + boxplot = line + ">" + break + except ValueError: + pass + if boxplot is None: + boxplot = "Missing plot" + quality_boxplots.write("<h4>%s</h4><p>%s</p>" % + + (os.path.basename(f), + boxplot)) + quality_boxplots.write("""</body> +</html> +""") + + # Handle DADA2 error rate plot PDFs + if args.pipeline == "DADA2": + print("Amplicon analysis: collecting error rate plots") + error_rate_plots_dir = os.path.abspath( + os.path.join("DADA2_OTU_tables", + "Error_rate_plots")) + error_rate_plot_pdfs = [os.path.basename(pdf) + for pdf in + sorted(glob.glob( + os.path.join(error_rate_plots_dir,"*.pdf")))] + with open("error_rate_plots.html","w") as error_rate_plots_out: + error_rate_plots_out.write("""<html> +<head> +<title>Amplicon analysis pipeline: DADA2 Error Rate Plots</title> +<head> +<body> +<h1>Amplicon analysis pipeline: DADA2 Error Rate Plots</h1> +""") + error_rate_plots_out.write("<ul>\n") + for pdf in error_rate_plot_pdfs: + error_rate_plots_out.write("<li>%s</li>\n" % ahref(pdf)) + error_rate_plots_out.write("<ul>\n") + error_rate_plots_out.write("""</body> +</html> +""") + + # Handle additional output when categories file was supplied + if args.categories_file is not None: + # Alpha diversity boxplots + print "Amplicon analysis: indexing alpha diversity boxplots" + boxplots_dir = os.path.abspath( + os.path.join("RESULTS", + "%s_%s" % (args.pipeline, + ref_database), + "Alpha_diversity", + "Alpha_diversity_boxplot", + "Categories_shannon")) + print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir + boxplot_pdfs = [os.path.basename(pdf) + for pdf in + sorted(glob.glob( + os.path.join(boxplots_dir,"*.pdf")))] + with open("alpha_diversity_boxplots.html","w") as boxplots_out: + boxplots_out.write("""<html> +<head> +<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title> +<head> +<body> +<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1> +""") + boxplots_out.write("<ul>\n") + for pdf in boxplot_pdfs: + boxplots_out.write("<li>%s</li>\n" % ahref(pdf)) + boxplots_out.write("<ul>\n") + boxplots_out.write("""</body> +</html> +""") + + # Finish + print "Amplicon analysis: finishing, exit code: %s" % exit_code + sys.exit(exit_code)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/amplicon_analysis_pipeline.xml Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,502 @@ +<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.3.5.0"> + <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description> + <requirements> + <requirement type="package" version="1.3.5">amplicon_analysis_pipeline</requirement> + </requirements> + <stdio> + <exit_code range="1:" /> + </stdio> + <command><![CDATA[ + + ## Convenience variable for pipeline name + #set $pipeline_name = $pipeline.pipeline_name + + ## Set the reference database name + #if str( $pipeline_name ) == "DADA2" + #set reference_database_name = "silva" + #else + #set reference_database = $pipeline.reference_database + #if $reference_database == "-S" + #set reference_database_name = "silva" + #else if $reference_database == "-H" + #set reference_database_name = "homd" + #else + #set reference_database_name = "gg" + #end if + #end if + + ## Run the amplicon analysis pipeline wrapper + python $__tool_directory__/amplicon_analysis_pipeline.py + ## Set options + #if str( $forward_pcr_primer ) != "" + -g "$forward_pcr_primer" + #end if + #if str( $reverse_pcr_primer ) != "" + -G "$reverse_pcr_primer" + #end if + #if str( $trimming_threshold ) != "" + -q $trimming_threshold + #end if + #if str( $sliding_window_length ) != "" + -l $sliding_window_length + #end if + #if str( $minimum_overlap ) != "" + -O $minimum_overlap + #end if + #if str( $minimum_length ) != "" + -L $minimum_length + #end if + -P $pipeline_name + -r \${AMPLICON_ANALYSIS_REF_DATA_PATH-ReferenceData} + #if str( $pipeline_name ) != "DADA2" + ${reference_database} + #end if + #if str($categories_file_in) != 'None' + -c "${categories_file_in}" + #end if + ## Input files + "${metatable_file_in}" + ## FASTQ pairs + #if str($input_type.pairs_or_collection) == "collection" + #set fastq_pairs = $input_type.fastq_collection + #else + #set fastq_pairs = $input_type.fastq_pairs + #end if + #for $fq_pair in $fastq_pairs + "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}" + #end for + && + + ## Collect outputs + cp Metatable_log/Metatable_mod.txt "${metatable_mod}" && + #if str( $pipeline_name ) == "Vsearch" + # Vsearch-specific + cp ${pipeline_name}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" && + cp Multiplexed_files/${pipeline_name}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" && + cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" && + #else + # DADA2-specific + cp ${pipeline_name}_OTU_tables/DADA2_tax_OTU_table.biom "${tax_otu_table_biom_file}" && + cp ${pipeline_name}_OTU_tables/seqs.fa "${dereplicated_nonchimera_otus_fasta}" && + #end if + cp ${pipeline_name}_OTU_tables/otus.tre "${otus_tre_file}" && + cp RESULTS/${pipeline_name}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" && + cp RESULTS/${pipeline_name}_${reference_database_name}/table_summary.txt "${table_summary_file}" && + cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" && + + ## OTU table heatmap + cp RESULTS/${pipeline_name}_${reference_database_name}/Heatmap.pdf "${heatmap_otu_table_pdf}"" && + + ## HTML outputs + + ## Phylum genus barcharts + mkdir $phylum_genus_dist_barcharts_html.files_path && + cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barcharts_html.files_path && + cp -r RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/raw_data $phylum_genus_dist_barcharts_html.files_path && + cp RESULTS/${pipeline_name}_${reference_database_name}/phylum_genus_charts/bar_charts.html "${phylum_genus_dist_barcharts_html}" && + + ## Beta diversity weighted 2d plots + mkdir $beta_div_even_weighted_2d_plots.files_path && + cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/* $beta_div_even_weighted_2d_plots.files_path && + cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_weighted_2d_plots}" && + + ## Beta diversity unweighted 2d plots + mkdir $beta_div_even_unweighted_2d_plots.files_path && + cp -r RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/* $beta_div_even_unweighted_2d_plots.files_path && + cp RESULTS/${pipeline_name}_${reference_database_name}/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_unweighted_2d_plots}" && + + ## Alpha diversity rarefaction plots + mkdir $alpha_div_rarefaction_plots.files_path && + cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/rarefaction_plots.html $alpha_div_rarefaction_plots && + cp -r RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/rarefaction_curves/average_plots $alpha_div_rarefaction_plots.files_path && + + ## DADA2 error rate plots + #if str($pipeline_name) == "DADA2" + mkdir $dada2_error_rate_plots.files_path && + cp DADA2_OTU_tables/Error_rate_plots/error_rate_plots.html $dada2_error_rate_plots && + cp -r DADA2_OTU_tables/Error_rate_plots/*.pdf $dada2_error_rate_plots.files_path && + #end if + + ## Categories data + #if str($categories_file_in) != 'None' + ## Alpha diversity boxplots + mkdir $alpha_div_boxplots.files_path && + cp alpha_diversity_boxplots.html "$alpha_div_boxplots" && + cp RESULTS/${pipeline_name}_${reference_database_name}/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf $alpha_div_boxplots.files_path && + #end if + + ## Pipeline outputs (log files etc) + mkdir $log_files.files_path && + cp Amplicon_analysis_pipeline.log $log_files.files_path && + cp pipeline.log $log_files.files_path && + cp Pipeline_outputs.txt $log_files.files_path && + cp Metatable_log/Metatable.html $log_files.files_path && + cp pipeline_outputs.html "$log_files" + ]]></command> + <inputs> + <param name="title" type="text" value="test" size="25" + label="Title" help="Optional text that will be added to the output dataset names" /> + <param type="data" name="metatable_file_in" format="tabular" + label="Input Metatable.txt file" /> + <param type="data" name="categories_file_in" format="txt" + label="Input Categories.txt file" optional="true" + help="(optional)" /> + <conditional name="input_type"> + <param name="pairs_or_collection" type="select" + label="Input FASTQ type"> + <option value="pairs_of_files">Pairs of datasets</option> + <option value="collection" selected="true">Dataset pairs in a collection</option> + </param> + <when value="collection"> + <param name="fastq_collection" type="data_collection" + format="fastqsanger,fastq" collection_type="list:paired" + label="Collection of FASTQ forward and reverse (R1/R2) pairs" + help="Each FASTQ pair will be treated as one sample; the name of each sample will be taken from the first column of the Metatable file " /> + </when> + <when value="pairs_of_files"> + <repeat name="fastq_pairs" title="Input fastq pairs" min="1"> + <param type="text" name="name" value="" + label="Final name for FASTQ pair" /> + <param type="data" name="fastq_r1" format="fastqsanger,fastq" + label="FASTQ with forward reads (R1)" /> + <param type="data" name="fastq_r2" format="fastqsanger,fastq" + label="FASTQ with reverse reads (R2)" /> + </repeat> + </when> + </conditional> + <param type="text" name="forward_pcr_primer" value="" + label="Forward PCR primer sequence" + help="Optional; must not include barcode or adapter sequence (-g)" /> + <param type="text" name="reverse_pcr_primer" value="" + label="Reverse PCR primer sequence" + help="Optional; must not include barcode or adapter sequence (-G)" /> + <param type="integer" name="trimming_threshold" value="20" + label="Threshold quality below which read will be trimmed" + help="Phred score; default is 20 (-q)" /> + <param type="integer" name="minimum_overlap" value="10" + label="Minimum overlap in bp between forward and reverse reads" + help="Default is 10 (-O)" /> + <param type="integer" name="minimum_length" value="200" + label="Minimum length in bp to keep sequence after overlapping" + help="Default is 200 (-L)" /> + <param type="integer" name="sliding_window_length" value="10" + label="Minimum length in bp to retain a read after trimming" + help="Supplied to Sickle; default is 10 (-l)" /> + <conditional name="pipeline"> + <param type="select" name="pipeline_name" + label="Pipeline to use for analysis"> + <option value="Vsearch" selected="true" >Vsearch</option> + <option value="DADA2">DADA2</option> + </param> + <when value="Vsearch"> + <param type="select" name="reference_database" + label="Reference database"> + <option value="" selected="true">GreenGenes</option> + <option value="-S">Silva</option> + <option value="-H">Human Oral Microbiome Database (HOMD)</option> + </param> + </when> + <when value="DADA2"> + </when> + </conditional> + </inputs> + <outputs> + <data format="tabular" name="metatable_mod" + label="${tool.name}:${title} Metatable_mod.txt" /> + <data format="tabular" name="read_counts_out" + label="${tool.name} (${pipeline.pipeline_name}):${title} read counts"> + <filter>pipeline['pipeline_name'] == 'Vsearch'</filter> + </data> + <data format="biom" name="tax_otu_table_biom_file" + label="${tool.name} (${pipeline.pipeline_name}):${title} tax OTU table (biom format)" /> + <data format="tabular" name="otus_tre_file" + label="${tool.name} (${pipeline.pipeline_name}):${title} otus.tre" /> + <data format="html" name="phylum_genus_dist_barcharts_html" + label="${tool.name} (${pipeline.pipeline_name}):${title} phylum genus dist barcharts HTML" /> + <data format="tabular" name="otus_count_file" + label="${tool.name} (${pipeline.pipeline_name}):${title} OTUs count file" /> + <data format="tabular" name="table_summary_file" + label="${tool.name} (${pipeline.pipeline_name}):${title} table summary file" /> + <data format="fasta" name="dereplicated_nonchimera_otus_fasta" + label="${tool.name} (${pipeline.pipeline_name}):${title} multiplexed linearized dereplicated mc2 repset nonchimeras OTUs FASTA" /> + <data format="html" name="fastqc_quality_boxplots_html" + label="${tool.name} (${pipeline.pipeline_name}):${title} FastQC per-base quality boxplots HTML" /> + <data format="pdf" name="heatmap_otu_table_pdf" + label="${tool.name} (${pipeline.pipeline_name}):${title} heatmap OTU table PDF" /> + <data format="html" name="beta_div_even_weighted_2d_plots" + label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity weighted 2D plots HTML" /> + <data format="html" name="beta_div_even_unweighted_2d_plots" + label="${tool.name} (${pipeline.pipeline_name}):${title} beta diversity unweighted 2D plots HTML" /> + <data format="html" name="alpha_div_rarefaction_plots" + label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity rarefaction plots HTML" /> + <data format="html" name="dada2_error_rate_plots" + label="${tool.name} (${pipeline.pipeline_name}):${title} DADA2 error rate plots"> + <filter>pipeline['pipeline_name'] == 'DADA2'</filter> + </data> + <data format="html" name="alpha_div_boxplots" + label="${tool.name} (${pipeline.pipeline_name}):${title} alpha diversity boxplots"> + <filter>categories_file_in is not None</filter> + </data> + <data format="html" name="log_files" + label="${tool.name} (${pipeline.pipeline_name}):${title} log files" /> + </outputs> + <tests> + </tests> + <help><![CDATA[ + +What it does +------------ + +This pipeline has been designed for the analysis of 16S rRNA data from +Illumina Miseq (Casava >= 1.8) paired-end reads. + +Usage +----- + +1. Preparation of the mapping file and format of unique sample id +***************************************************************** + +Before using the amplicon analysis pipeline it would be necessary to +follow the steps as below to avoid analysis failures and ensure samples +are labelled appropriately. Sample names for the labelling are derived +from the fastq files names that are generated from the sequencing. The +labels will include everything between the beginning of the name and +the sample number (from C11 to S19 in Fig. 1) + +.. image:: Pipeline_description_Fig1.png + :height: 46 + :width: 382 + +**Figure 1** + +If analysing 16S data from multiple runs: + +The samples from different runs may have identical IDs. For example, +when sequencing the same samples twice, by chance, these could be at +the same position in both the runs. This would cause the fastq files +to have exactly the same IDs (Fig. 2). + +.. image:: Pipeline_description_Fig2.png + :height: 100 + :width: 463 + +**Figure 2** + +In case of identical sample IDs the pipeline will fail to run and +generate an error at the beginning of the analysis. + +To avoid having to change the file names, before uploading the files, +ensure that the samples IDs are not repeated. + +2. To upload the file +********************* + +Click on **Get Data/Upload File** from the Galaxy tool panel on the +left hand side. + +From the pop-up window, choose how to upload the file. The +**Choose local file** option can be used for files up to 4Gb. Fastq files +from Illumina MiSeq will rarely be bigger than 4Gb and this option is +recommended. + +After choosing the files click **Start** to begin the upload. The window can +now be closed and the files will be uploaded onto the Galaxy server. You +will see the progress on the ``HISTORY`` panel on the right +side of the screen. The colour will change from grey (queuing), to yellow +(uploading) and finally green (uploaded). + +Once all the files are uploaded, click on the operations on multiple +datasets icon and select the fastq files that need to be analysed. +Click on the tab **For all selected...** and on the option +**Build List of Dataset pairs** (Fig. 3). + +.. image:: Pipeline_description_Fig3.png + :height: 247 + :width: 586 + +**Figure 3** + +Change the filter parameter ``_1`` and ``_2`` to be ``_R1`` and ``_R2``. +The fastq files forward R1 and reverse R2 should now appear in the +corresponding columns. + +Select **Autopair**. This creates a collection of paired fastq files for +the forward and reverse reads for each sample. The name of the pairs will +be the ones used by the pipeline. You are free to change the names at this +point as long as they are the same used in the Metatable file +(see section 3). + +Name the collection and click on **create list**. This reduces the time +required to input the forward and reverse reads for each individual sample. + +3. Create the Metatable files +***************************** + +Metatable.txt +~~~~~~~~~~~~~ + +Click on the list of pairs you just created to see the name of the single +pairs. The name of the pairs will be the ones used by the pipeline, +therefore, these are the names that need to be used in the Metatable file. + +The Metatable file has to be in QIIME format. You can find a description +of it on QIIME website http://qiime.org/documentation/file_formats.html + +EXAMPLE:: + + #SampleID BarcodeSequence LinkerPrimerSequence Disease Gender Description + Mock-RUN1 TAAGGCGAGCGTAAGA PsA Male Control + Mock-RUN2 CGTACTAGGCGTAAGA PsA Male Control + Mock-RUN3 AGGCAGAAGCGTAAGA PsC Female Control + +Briefly: the column ``LinkerPrimerSequence`` is empty but it cannot be +deleted. The header is very important. ``#SampleID``, ``Barcode``, +``LinkerPrimerSequence`` and ``Description`` are mandatory. Between +``LinkerPrimerSequence`` and ``Description`` you can add as many columns +as you want. For every column a PCoA plot will be created (see +**Results** section). You can create this file in Excel and it will have +to be saved as ``Text(Tab delimited)``. + +During the analysis the Metatable.txt will be checked to ensure that the +file has the correct format. If necessary, this will be modified and will +be available as Metatable_corrected.txt in the history panel. If you are +going to use the metatable file for any other statistical analyses, +remember to use the ``Metatable_mod.txt`` one, otherwise the sample +names might not match! + +Categories.txt (optional) +~~~~~~~~~~~~~~~~~~~~~~~~~ + +This file is required if you want to get box plots for comparison of +alpha diversity indices (see **Results** section). The file is a list +(without header and IN ONE COLUMN) of categories present in the +Metatable.txt file. THE NAMES YOU ARE USING HAVE TO BE THE SAME AS THE +ONES USED IN THE METATABLE.TXT. You can create this file in Excel and +will have to be saved as ``Text(Tab delimited)``. + +EXAMPLE:: + + Disease + Gender + +Metatable and categories files can be uploaded using Get Data as done +with the fatsq files. + +4. Analysis +*********** + +Under **Amplicon_Analysis_Pipeline** + + * **Title** Name to distinguish between the runs. It will be shown at + the beginning of each output file name. + + * **Input Metatable.txt file** Select the Metatable.txt file related to + this analysis + + * **Input Categories.txt file (Optional)** Select the Categories.txt file + related to this analysis + + * **Input FASTQ type** select *Dataset pairs in a collection* and, then, + the collection of pairs you created earlier. + + * **Forward/Reverse PCR primer sequence** if the PCR primer sequences + have not been removed from the MiSeq during the fastq creation, they + have to be removed before the analysis. Insert the PCR primer sequence + in the corresponding field. DO NOT include any barcode or adapter + sequence. If the PCR primers have been already trimmed by the MiSeq, + and you include the sequence in this field, this would lead to an error. + Only include the sequences if still present in the fastq files. + + * **Threshold quality below which reads will be trimmed** Choose the + Phred score used by Sickle to trim the reads at the 3’ end. + + * **Minimum length to retain a read after trimming** If the read length + after trimming is shorter than a user defined length, the read, along + with the corresponding read pair, will be discarded. + + * **Minimum overlap in bp between forward and reverse reads** Choose the + minimum basepair overlap used by Pandaseq to assemble the reads. + Default is 10. + + * **Minimum length in bp to keep a sequence after overlapping** Choose the + minimum sequence length used by Pandaseq to keep a sequence after the + overlapping. This depends on the expected amplicon length. Default is + 380 (used for V3-V4 16S sequencing; expected length ~440bp) + + * **Pipeline to use for analysis** Choose the pipeline to use for OTU + clustering and chimera removal. The Galaxy tool supports the ``Vsearch`` + and ``DADA2`` pipelines. + + * **Reference database** Choose between ``GreenGenes``, ``Silva`` or + ``HOMD`` (Human Oral Microbiome Database) for taxa assignment. + +Click on **Execute** to start the analysis. + +5. Results +********** + +Results are entirely generated using QIIME scripts. The results will +appear in the History panel when the analysis is completed. + +The following outputs are captured: + + * **Vsearch_tax_OTU_table.biom|DADA2_tax_OTU_table.biom (biom format)** + The OTU table in BIOM format (http://biom-format.org/) + + * **otus.tre** Phylogenetic tree constructed using ``make_phylogeny.py`` + (fasttree) QIIME script (http://qiime.org/scripts/make_phylogeny.html) + + * **Phylum_genus_dist_barcharts_HTML** HTML file with bar charts at + Phylum, Genus and Species level + (http://qiime.org/scripts/summarize_taxa.html and + http://qiime.org/scripts/plot_taxa_summary.html) + + * **OTUs_count_file** Summary of OTU counts per sample + (http://biom-format.org/documentation/summarizing_biom_tables.html) + + * **Table_summary_file** Summary of sequences counts per sample + (http://biom-format.org/documentation/summarizing_biom_tables.html) + + * **multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta|seqs.fa** + Fasta file with OTU sequences (Vsearch|DADA2) + + * **Heatmap_PDF** OTU heatmap in PDF format + (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html ) + + * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML + format using weighted Unifrac distance measure. Samples are grouped + by the column names present in the Metatable file. The samples are + firstly rarefied to the minimum sequencing depth + (http://qiime.org/scripts/beta_diversity_through_plots.html ) + + * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML + format using Unweighted Unifrac distance measure. Samples are grouped + by the column names present in the Metatable file. The samples are + firstly rarefied to the minimum sequencing depth + (http://qiime.org/scripts/beta_diversity_through_plots.html ) + +Code availability +----------------- + +**Code is available at** https://github.com/MTutino/Amplicon_analysis + +Credits +------- + +Pipeline author: Mauro Tutino + +Galaxy tool: Peter Briggs + + ]]></help> + <citations> + <citation type="bibtex"> + @misc{githubAmplicon_analysis, + author = {Tutino, Mauro}, + year = {2017}, + title = {Amplicon Analysis Pipeline}, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/MTutino/Amplicon_analysis}, +}</citation> + </citations> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/install_amplicon_analysis-1.3.5.sh Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,394 @@ +#!/bin/sh -e +# +# Prototype script to setup a conda environment with the +# dependencies needed for the Amplicon_analysis_pipeline +# script +# +# Handle command line +usage() +{ + echo "Usage: $(basename $0) [DIR]" + echo "" + echo "Installs the Amplicon_analysis_pipeline package plus" + echo "dependencies in directory DIR (or current directory " + echo "if DIR not supplied)" +} +if [ ! -z "$1" ] ; then + # Check if help was requested + case "$1" in + --help|-h) + usage + exit 0 + ;; + esac + # Assume it's the installation directory + cd $1 +fi +# Versions +PIPELINE_VERSION=1.3.5 +CONDA_REQUIRED_VERSION=4.6.14 +RDP_CLASSIFIER_VERSION=2.2 +# Directories +TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION} +BIN_DIR=${TOP_DIR}/bin +CONDA_DIR=${TOP_DIR}/conda +CONDA_BIN=${CONDA_DIR}/bin +CONDA_LIB=${CONDA_DIR}/lib +CONDA=${CONDA_BIN}/conda +ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}" +ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME +# +# Functions +# +# Report failure and terminate script +fail() +{ + echo "" + echo ERROR $@ >&2 + echo "" + echo "$(basename $0): installation failed" + exit 1 +} +# +# Rewrite the shebangs in the installed conda scripts +# to remove the full path to conda 'bin' directory +rewrite_conda_shebangs() +{ + pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g" + find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \; +} +# +# Reset conda version if required +reset_conda_version() +{ + CONDA_VERSION="$(${CONDA_BIN}/conda -V 2>&1 | head -n 1 | cut -d' ' -f2)" + echo conda version: ${CONDA_VERSION} + if [ "${CONDA_VERSION}" != "${CONDA_REQUIRED_VERSION}" ] ; then + echo "Resetting conda to last known working version $CONDA_REQUIRED_VERSION" + ${CONDA_BIN}/conda config --set allow_conda_downgrades true + ${CONDA_BIN}/conda install -y conda=${CONDA_REQUIRED_VERSION} + else + echo "conda version ok" + fi +} +# +# Install conda +install_conda() +{ + echo "++++++++++++++++" + echo "Installing conda" + echo "++++++++++++++++" + if [ -e ${CONDA_DIR} ] ; then + echo "*** $CONDA_DIR already exists ***" >&2 + return + fi + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh + bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR} + echo Installed conda in ${CONDA_DIR} + # Reset the conda version to a known working version + # (to avoid problems observed with e.g. conda 4.7.10) + echo "" + reset_conda_version + # Update the installation files + # This is to avoid problems when the length the installation + # directory path exceeds the limit for the shebang statement + # in the conda files + echo "" + echo -n "Rewriting conda shebangs..." + rewrite_conda_shebangs + echo "ok" + echo -n "Adding conda bin to PATH..." + PATH=${CONDA_BIN}:$PATH + echo "ok" + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# Create conda environment +install_conda_packages() +{ + echo "+++++++++++++++++++++++++" + echo "Installing conda packages" + echo "+++++++++++++++++++++++++" + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + cat >environment.yml <<EOF +name: ${ENV_NAME} +channels: + - defaults + - conda-forge + - bioconda +dependencies: + - python=2.7 + - cutadapt=1.8 + - sickle-trim=1.33 + - bioawk=1.0 + - pandaseq=2.8.1 + - spades=3.10.1 + - fastqc=0.11.3 + - qiime=1.9.1 + - blast-legacy=2.2.26 + - fasta-splitter=0.2.6 + - rdp_classifier=$RDP_CLASSIFIER_VERSION + - vsearch=2.10.4 + - r=3.5.1 + - r-tidyverse=1.2.1 + - bioconductor-dada2=1.8 + - bioconductor-biomformat=1.8.0 +EOF + ${CONDA} env create --name "${ENV_NAME}" -f environment.yml + echo Created conda environment in ${ENV_DIR} + cd $cwd + rm -rf $wd/* + rmdir $wd + # + # Patch qiime 1.9.1 tools to switch deprecated 'axisbg' + # matplotlib property to 'facecolor': + # https://matplotlib.org/api/prev_api_changes/api_changes_2.0.0.html + echo "" + for exe in make_2d_plots.py plot_taxa_summary.py ; do + echo -n "Patching ${exe}..." + find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/axisbg=/facecolor=/g' {} \; + echo "done" + done + # + # Patch qiime 1.9.1 tools to switch deprecated 'set_axis_bgcolor' + # method call to 'set_facecolor': + # https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_axis_bgcolor.html + for exe in make_rarefaction_plots.py ; do + echo -n "Patching ${exe}..." + find ${CONDA_DIR} -type f -name "$exe" -exec sed -i 's/set_axis_bgcolor/set_facecolor/g' {} \; + echo "done" + done +} +# +# Install all the non-conda dependencies in a single +# function (invokes separate functions for each package) +install_non_conda_packages() +{ + echo "+++++++++++++++++++++++++++++" + echo "Installing non-conda packages" + echo "+++++++++++++++++++++++++++++" + # Temporary working directory + local wd=$(mktemp -d) + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + # Amplicon analysis pipeline + echo -n "Installing Amplicon_analysis_pipeline..." + if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then + echo "already installed" + else + install_amplicon_analysis_pipeline + echo "ok" + fi + # ChimeraSlayer + echo -n "Installing ChimeraSlayer..." + if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then + echo "already installed" + else + install_chimeraslayer + echo "ok" + fi + # Uclust + # This no longer seems to be available for download from + # drive5.com so don't download + echo "WARNING uclust not available: skipping installation" +} +# +# Amplicon analyis pipeline +install_amplicon_analysis_pipeline() +{ + local wd=$(mktemp -d) + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://github.com/MTutino/Amplicon_analysis/archive/${PIPELINE_VERSION}.tar.gz + tar zxf ${PIPELINE_VERSION}.tar.gz + cd Amplicon_analysis-${PIPELINE_VERSION} + INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION} + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline + for f in *.sh *.R ; do + /bin/cp $f $INSTALL_DIR + done + /bin/cp -r uc2otutab $INSTALL_DIR + mkdir -p ${BIN_DIR} + cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF +#!/usr/bin/env bash +# +# Point to Qiime config +export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config +# Set up the RDP jar file +export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar +# Set the Matplotlib backend +export MPLBACKEND="agg" +# Put the scripts onto the PATH +export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH +# Activate the conda environment +export PATH=${CONDA_BIN}:\$PATH +source ${CONDA_BIN}/activate ${ENV_NAME} +# Execute the driver script with the supplied arguments +$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@ +exit \$? +EOF + chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh + cat >${BIN_DIR}/install_reference_data.sh <<EOF +#!/usr/bin/env bash -e +# +function usage() { + echo "Usage: \$(basename \$0) DIR" +} +if [ -z "\$1" ] ; then + usage + exit 0 +elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then + usage + echo "" + echo "Install reference data into DIR" + exit 0 +fi +echo "==========================================" +echo "Installing Amplicon analysis pipeline data" +echo "==========================================" +if [ ! -e "\$1" ] ; then + echo "Making directory \$1" + mkdir -p \$1 +fi +cd \$1 +DATA_DIR=\$(pwd) +echo "Installing reference data under \$DATA_DIR" +$INSTALL_DIR/References.sh +echo "" +echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh" +echo "to use the reference data from this directory" +echo "" +echo "\$(basename \$0): finished" +EOF + chmod 0755 ${BIN_DIR}/install_reference_data.sh + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# ChimeraSlayer +install_chimeraslayer() +{ + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz + tar zxf microbiomeutil_2010-04-29.tar.gz + cd microbiomeutil_2010-04-29 + INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29 + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer + /bin/cp -r ChimeraSlayer $INSTALL_DIR + cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF +#!/usr/bin/env bash +export PATH=$INSTALL_DIR:\$PATH +$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@ +EOF + chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl + chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# uclust required for QIIME/pyNAST +# License only allows this version to be used with those two packages +# See: http://drive5.com/uclust/downloads1_2_22q.html +install_uclust() +{ + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64 + INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22 + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust + /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust + chmod 0755 ${INSTALL_DIR}/uclust + ln -s ${INSTALL_DIR}/uclust ${BIN_DIR} + cd $cwd + rm -rf $wd/* + rmdir $wd +} +setup_pipeline_environment() +{ + echo "+++++++++++++++++++++++++++++++" + echo "Setting up pipeline environment" + echo "+++++++++++++++++++++++++++++++" + # fasta_splitter.pl + echo -n "Setting up fasta_splitter.pl..." + if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then + echo "already exists" + elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then + echo "failed" + fail "fasta-splitter.pl not found" + else + ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl + echo "ok" + fi + # rdp_classifier.jar + local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar + echo -n "Setting up rdp_classifier.jar..." + if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then + echo "already exists" + elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then + echo "failed" + fail "rdp_classifier.jar not found" + else + mkdir -p ${TOP_DIR}/share/rdp_classifier + ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} + echo "ok" + fi + # qiime_config + echo -n "Setting up qiime_config..." + if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then + echo "already exists" + else + mkdir -p ${TOP_DIR}/qiime + cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config +qiime_scripts_dir ${ENV_DIR}/bin +EOF-qiime-config + echo "ok" + fi +} +# +# Top level script does the installation +echo "=======================================" +echo "Amplicon_analysis_pipeline installation" +echo "=======================================" +echo "Installing into ${TOP_DIR}" +if [ -e ${TOP_DIR} ] ; then + fail "Directory already exists" +fi +mkdir -p ${TOP_DIR} +install_conda +install_conda_packages +install_non_conda_packages +setup_pipeline_environment +echo "====================================" +echo "Amplicon_analysis_pipeline installed" +echo "====================================" +echo "" +echo "Install reference data using:" +echo "" +echo "\$ ${BIN_DIR}/install_reference_data.sh DIR" +echo "" +echo "Run pipeline scripts using:" +echo "" +echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..." +echo "" +echo "(or add ${BIN_DIR} to your PATH)" +echo "" +echo "$(basename $0): finished" +## +#
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/install_amplicon_analysis.sh Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,425 @@ +#!/bin/sh -e +# +# Prototype script to setup a conda environment with the +# dependencies needed for the Amplicon_analysis_pipeline +# script +# +# Handle command line +usage() +{ + echo "Usage: $(basename $0) [DIR]" + echo "" + echo "Installs the Amplicon_analysis_pipeline package plus" + echo "dependencies in directory DIR (or current directory " + echo "if DIR not supplied)" +} +if [ ! -z "$1" ] ; then + # Check if help was requested + case "$1" in + --help|-h) + usage + exit 0 + ;; + esac + # Assume it's the installation directory + cd $1 +fi +# Versions +PIPELINE_VERSION=1.2.3 +RDP_CLASSIFIER_VERSION=2.2 +# Directories +TOP_DIR=$(pwd)/Amplicon_analysis-${PIPELINE_VERSION} +BIN_DIR=${TOP_DIR}/bin +CONDA_DIR=${TOP_DIR}/conda +CONDA_BIN=${CONDA_DIR}/bin +CONDA_LIB=${CONDA_DIR}/lib +CONDA=${CONDA_BIN}/conda +ENV_NAME="amplicon_analysis_pipeline@${PIPELINE_VERSION}" +ENV_DIR=${CONDA_DIR}/envs/$ENV_NAME +# +# Functions +# +# Report failure and terminate script +fail() +{ + echo "" + echo ERROR $@ >&2 + echo "" + echo "$(basename $0): installation failed" + exit 1 +} +# +# Rewrite the shebangs in the installed conda scripts +# to remove the full path to conda 'bin' directory +rewrite_conda_shebangs() +{ + pattern="s,^#!${CONDA_BIN}/,#!/usr/bin/env ,g" + find ${CONDA_BIN} -type f -exec sed -i "$pattern" {} \; +} +# +# Install conda +install_conda() +{ + echo "++++++++++++++++" + echo "Installing conda" + echo "++++++++++++++++" + if [ -e ${CONDA_DIR} ] ; then + echo "*** $CONDA_DIR already exists ***" >&2 + return + fi + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh + bash ./Miniconda2-latest-Linux-x86_64.sh -b -p ${CONDA_DIR} + echo Installed conda in ${CONDA_DIR} + # Update the installation files + # This is to avoid problems when the length the installation + # directory path exceeds the limit for the shebang statement + # in the conda files + echo "" + echo -n "Rewriting conda shebangs..." + rewrite_conda_shebangs + echo "ok" + echo -n "Adding conda bin to PATH..." + PATH=${CONDA_BIN}:$PATH + echo "ok" + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# Create conda environment +install_conda_packages() +{ + echo "+++++++++++++++++++++++++" + echo "Installing conda packages" + echo "+++++++++++++++++++++++++" + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + cat >environment.yml <<EOF +name: ${ENV_NAME} +channels: + - defaults + - conda-forge + - bioconda +dependencies: + - python=2.7 + - cutadapt=1.11 + - sickle-trim=1.33 + - bioawk=1.0 + - pandaseq=2.8.1 + - spades=3.5.0 + - fastqc=0.11.3 + - qiime=1.8.0 + - blast-legacy=2.2.26 + - fasta-splitter=0.2.4 + - rdp_classifier=$RDP_CLASSIFIER_VERSION + - vsearch=1.1.3 + # Need to explicitly specify libgfortran + # version (otherwise get version incompatible + # with numpy=1.7.1) + - libgfortran=1.0 + # Compilers needed to build R + - gcc_linux-64 + - gxx_linux-64 + - gfortran_linux-64 +EOF + ${CONDA} env create --name "${ENV_NAME}" -f environment.yml + echo Created conda environment in ${ENV_DIR} + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# Install all the non-conda dependencies in a single +# function (invokes separate functions for each package) +install_non_conda_packages() +{ + echo "+++++++++++++++++++++++++++++" + echo "Installing non-conda packages" + echo "+++++++++++++++++++++++++++++" + # Temporary working directory + local wd=$(mktemp -d) + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + # Amplicon analysis pipeline + echo -n "Installing Amplicon_analysis_pipeline..." + if [ -e ${BIN_DIR}/Amplicon_analysis_pipeline.sh ] ; then + echo "already installed" + else + install_amplicon_analysis_pipeline + echo "ok" + fi + # ChimeraSlayer + echo -n "Installing ChimeraSlayer..." + if [ -e ${BIN_DIR}/ChimeraSlayer.pl ] ; then + echo "already installed" + else + install_chimeraslayer + echo "ok" + fi + # Uclust + echo -n "Installing uclust for QIIME/pyNAST..." + if [ -e ${BIN_DIR}/uclust ] ; then + echo "already installed" + else + install_uclust + echo "ok" + fi + # R 3.2.1" + echo -n "Checking for R 3.2.1..." + if [ -e ${BIN_DIR}/R ] ; then + echo "R already installed" + else + echo "not found" + install_R_3_2_1 + fi +} +# +# Amplicon analyis pipeline +install_amplicon_analysis_pipeline() +{ + local wd=$(mktemp -d) + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${PIPELINE_VERSION}.tar.gz + tar zxf v${PIPELINE_VERSION}.tar.gz + cd Amplicon_analysis-${PIPELINE_VERSION} + INSTALL_DIR=${TOP_DIR}/share/amplicon_analysis_pipeline-${PIPELINE_VERSION} + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/amplicon_analysis_pipeline + for f in *.sh ; do + /bin/cp $f $INSTALL_DIR + done + /bin/cp -r uc2otutab $INSTALL_DIR + mkdir -p ${BIN_DIR} + cat >${BIN_DIR}/Amplicon_analysis_pipeline.sh <<EOF +#!/usr/bin/env bash +# +# Point to Qiime config +export QIIME_CONFIG_FP=${TOP_DIR}/qiime/qiime_config +# Set up the RDP jar file +export RDP_JAR_PATH=${TOP_DIR}/share/rdp_classifier/rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar +# Put the scripts onto the PATH +export PATH=${BIN_DIR}:${INSTALL_DIR}:\$PATH +# Activate the conda environment +export PATH=${CONDA_BIN}:\$PATH +source ${CONDA_BIN}/activate ${ENV_NAME} +# Execute the driver script with the supplied arguments +$INSTALL_DIR/Amplicon_analysis_pipeline.sh \$@ +exit \$? +EOF + chmod 0755 ${BIN_DIR}/Amplicon_analysis_pipeline.sh + cat >${BIN_DIR}/install_reference_data.sh <<EOF +#!/usr/bin/env bash -e +# +function usage() { + echo "Usage: \$(basename \$0) DIR" +} +if [ -z "\$1" ] ; then + usage + exit 0 +elif [ "\$1" == "--help" ] || [ "\$1" == "-h" ] ; then + usage + echo "" + echo "Install reference data into DIR" + exit 0 +fi +echo "==========================================" +echo "Installing Amplicon analysis pipeline data" +echo "==========================================" +if [ ! -e "\$1" ] ; then + echo "Making directory \$1" + mkdir -p \$1 +fi +cd \$1 +DATA_DIR=\$(pwd) +echo "Installing reference data under \$DATA_DIR" +$INSTALL_DIR/References.sh +echo "" +echo "Use '-r \$DATA_DIR' when running Amplicon_analysis_pipeline.sh" +echo "to use the reference data from this directory" +echo "" +echo "\$(basename \$0): finished" +EOF + chmod 0755 ${BIN_DIR}/install_reference_data.sh + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# ChimeraSlayer +install_chimeraslayer() +{ + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz + tar zxf microbiomeutil_2010-04-29.tar.gz + cd microbiomeutil_2010-04-29 + INSTALL_DIR=${TOP_DIR}/share/microbiome_chimeraslayer-2010-04-29 + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/microbiome_chimeraslayer + /bin/cp -r ChimeraSlayer $INSTALL_DIR + cat >${BIN_DIR}/ChimeraSlayer.pl <<EOF +#!/usr/bin/env bash +export PATH=$INSTALL_DIR:\$PATH +$INSTALL_DIR/ChimeraSlayer/ChimeraSlayer.pl $@ +EOF + chmod 0755 ${INSTALL_DIR}/ChimeraSlayer/ChimeraSlayer.pl + chmod 0755 ${BIN_DIR}/ChimeraSlayer.pl + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# uclust required for QIIME/pyNAST +# License only allows this version to be used with those two packages +# See: http://drive5.com/uclust/downloads1_2_22q.html +install_uclust() +{ + local wd=$(mktemp -d) + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + wget -q http://drive5.com/uclust/uclustq1.2.22_i86linux64 + INSTALL_DIR=${TOP_DIR}/share/uclust-1.2.22 + mkdir -p $INSTALL_DIR + ln -s $INSTALL_DIR ${TOP_DIR}/share/uclust + /bin/mv uclustq1.2.22_i86linux64 ${INSTALL_DIR}/uclust + chmod 0755 ${INSTALL_DIR}/uclust + ln -s ${INSTALL_DIR}/uclust ${BIN_DIR} + cd $cwd + rm -rf $wd/* + rmdir $wd +} +# +# R 3.2.1 +# Can't use version from conda due to dependency conflicts +install_R_3_2_1() +{ + . ${CONDA_BIN}/activate ${ENV_NAME} + local cwd=$(pwd) + local wd=$(mktemp -d) + cd $wd + echo -n "Fetching R 3.2.1 source code..." + wget -q http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz + echo "ok" + INSTALL_DIR=${TOP_DIR} + mkdir -p $INSTALL_DIR + echo -n "Unpacking source code..." + tar xzf R-3.2.1.tar.gz >INSTALL.log 2>&1 + echo "ok" + cd R-3.2.1 + echo -n "Running configure..." + ./configure --prefix=$INSTALL_DIR --with-x=no --with-readline=no >>INSTALL.log 2>&1 + echo "ok" + echo -n "Running make..." + make >>INSTALL.log 2>&1 + echo "ok" + echo -n "Running make install..." + make install >>INSTALL.log 2>&1 + echo "ok" + cd $cwd + rm -rf $wd/* + rmdir $wd + . ${CONDA_BIN}/deactivate +} +setup_pipeline_environment() +{ + echo "+++++++++++++++++++++++++++++++" + echo "Setting up pipeline environment" + echo "+++++++++++++++++++++++++++++++" + # vsearch113 + echo -n "Setting up vsearch113..." + if [ -e ${BIN_DIR}/vsearch113 ] ; then + echo "already exists" + elif [ ! -e ${ENV_DIR}/bin/vsearch ] ; then + echo "failed" + fail "vsearch not found" + else + ln -s ${ENV_DIR}/bin/vsearch ${BIN_DIR}/vsearch113 + echo "ok" + fi + # fasta_splitter.pl + echo -n "Setting up fasta_splitter.pl..." + if [ -e ${BIN_DIR}/fasta-splitter.pl ] ; then + echo "already exists" + elif [ ! -e ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ] ; then + echo "failed" + fail "fasta-splitter.pl not found" + else + ln -s ${ENV_DIR}/share/fasta-splitter/fasta-splitter.pl ${BIN_DIR}/fasta-splitter.pl + echo "ok" + fi + # rdp_classifier.jar + local rdp_classifier_jar=rdp_classifier-${RDP_CLASSIFIER_VERSION}.jar + echo -n "Setting up rdp_classifier.jar..." + if [ -e ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} ] ; then + echo "already exists" + elif [ ! -e ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ] ; then + echo "failed" + fail "rdp_classifier.jar not found" + else + mkdir -p ${TOP_DIR}/share/rdp_classifier + ln -s ${ENV_DIR}/share/rdp_classifier/rdp_classifier.jar ${TOP_DIR}/share/rdp_classifier/${rdp_classifier_jar} + echo "ok" + fi + # qiime_config + echo -n "Setting up qiime_config..." + if [ -e ${TOP_DIR}/qiime/qiime_config ] ; then + echo "already exists" + else + mkdir -p ${TOP_DIR}/qiime + cat >${TOP_DIR}/qiime/qiime_config <<EOF-qiime-config +qiime_scripts_dir ${ENV_DIR}/bin +EOF-qiime-config + echo "ok" + fi +} +# +# Remove the compilers from the conda environment +# Not sure if this step is necessary +remove_conda_compilers() +{ + echo "+++++++++++++++++++++++++++++++++++++++++" + echo "Removing compilers from conda environment" + echo "+++++++++++++++++++++++++++++++++++++++++" + ${CONDA} remove -y -n ${ENV_NAME} gcc_linux-64 gxx_linux-64 gfortran_linux-64 +} +# +# Top level script does the installation +echo "=======================================" +echo "Amplicon_analysis_pipeline installation" +echo "=======================================" +echo "Installing into ${TOP_DIR}" +if [ -e ${TOP_DIR} ] ; then + fail "Directory already exists" +fi +mkdir -p ${TOP_DIR} +install_conda +install_conda_packages +install_non_conda_packages +setup_pipeline_environment +remove_conda_compilers +echo "====================================" +echo "Amplicon_analysis_pipeline installed" +echo "====================================" +echo "" +echo "Install reference data using:" +echo "" +echo "\$ ${BIN_DIR}/install_reference_data.sh DIR" +echo "" +echo "Run pipeline scripts using:" +echo "" +echo "\$ ${BIN_DIR}/Amplicon_analysis_pipeline.sh ..." +echo "" +echo "(or add ${BIN_DIR} to your PATH)" +echo "" +echo "$(basename $0): finished" +## +#
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/outputs.txt Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,41 @@ +ok.. Metatable_log/Metatable_mod.txt +ok.. Vsearch_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom +ok.. Vsearch_OTU_tables/otus.tre +ok.. RESULTS/Vsearch_gg/OTUs_count.txt +ok.. RESULTS/Vsearch_gg/table_summary.txt +ok.. Multiplexed_files/Vsearch_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta +ok.. QUALITY_CONTROL/Reads_count.txt +ok.. fastqc_quality_boxplots.html -> generated by the Python wrapper +NO.. RESULTS/Vsearch_gg/Heatmap/js -> RESULTS/Vsearch_gg/Heatmap.pdf +NO.. RESULTS/Vsearch_gg/Heatmap/otu_table.html -> MISSING +ok.. RESULTS/Vsearch_gg/phylum_genus_charts/charts/ +ok.. RESULTS/Vsearch_gg/phylum_genus_charts/raw_data/ +ok.. RESULTS/Vsearch_gg/phylum_genus_charts/bar_charts.html +ok.. RESULTS/Vsearch_gg/beta_div_even/weighted_2d_plot/* +ok.. RESULTS/Vsearch_gg/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html +ok.. RESULTS/Vsearch_gg/beta_div_even/unweighted_2d_plot/* +ok.. RESULTS/Vsearch_gg/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html +ok.. RESULTS/Vsearch_gg/Alpha_diversity/rarefaction_curves/rarefaction_plots.html +ok.. RESULTS/Vsearch_gg/Alpha_diversity/rarefaction_curves/average_plots +ok.. RESULTS/Vsearch_gg/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf + +??.. Metatable_log/Metatable_mod.txt +NO.. DADA2_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom +ok.. DADA2_OTU_tables/otus.tre +ok.. RESULTS/DADA2_silva/OTUs_count.txt +ok.. RESULTS/DADA2_silva/table_summary.txt +ok.. Multiplexed_files/DADA2_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta --> DADA2_OTU_tables/seqs.fa +NO.. QUALITY_CONTROL/Reads_count.txt -> Vsearch only +ok.. fastqc_quality_boxplots.html -> generated by the Python wrapper +NO.. RESULTS/DADA2_silva/Heatmap/js -> RESULTS/DADA2_silva/Heatmap.pdf +NO.. RESULTS/DADA2_silva/Heatmap/otu_table.html +ok.. RESULTS/DADA2_silva/phylum_genus_charts/charts/ +ok.. RESULTS/DADA2_silva/phylum_genus_charts/raw_data/ +ok.. RESULTS/DADA2_silva/phylum_genus_charts/bar_charts.html +ok.. RESULTS/DADA2_silva/beta_div_even/weighted_2d_plot/* +ok.. RESULTS/DADA2_silva/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html +ok.. RESULTS/DADA2_silva/beta_div_even/unweighted_2d_plot/* +ok.. RESULTS/DADA2_silva/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html +ok.. RESULTS/DADA2_silva/Alpha_diversity/rarefaction_curves/rarefaction_plots.html +ok.. RESULTS/DADA2_silva/Alpha_diversity/rarefaction_curves/average_plots +ok.. RESULTS/DADA2_silva/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf -> missing? (didn't include categories?)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,16 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="amplicon_analysis_pipeline" version="1.3.5"> + <install version="1.0"> + <actions> + <action type="download_file">https://raw.githubusercontent.com/pjbriggs/Amplicon_analysis-galaxy/update-to-Amplicon_analysis_pipeline-1.3/install_amplicon_analysis-1.3.5.sh</action> + <action type="shell_command"> + sh ./install_amplicon_analysis.sh $INSTALL_DIR + </action> + <action type="set_environment"> + <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/Amplicon_analysis-1.3.5/bin</environment_variable> + </action> + </actions> + </install> + </package> +</tool_dependency>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/updating-to-pipeline-1.3-DADA2.txt Thu Dec 05 11:48:01 2019 +0000 @@ -0,0 +1,58 @@ +Notes on updating Galaxy tool to pipeline 1.3 (DADA2) +===================================================== + +Where stuff is: + +* projects/Amplicon_analysis-galaxy: git repo for Galaxy tool (these + developments are in the 'update-to-Amplicon_analysis_pipeline-1.3' + branch, PR #50: + https://github.com/pjbriggs/Amplicon_analysis-galaxy/pull/50) + +* scratchpad/test_Amplicon_analysis_pipeline_DADA2: directory for + running/testing the updates + +So far: + +* Updated the installer for pipeline version 1.3.2 + +* Have been trying to run the pipeline manually outside of Galaxy + on popov & CSF3: + -- DADA2 works on popov (can't remember if it works on CSF3) + -- Vsearch pipeline fails on popov and CSF3 (but errors are + different) + +* Mauro is looking at fixing the errors while I carry on trying + to update the Galaxy tool + +Random notes from my notebook: + +p44: + +* DADA2 uses NSLOTS environment variable from the local environment + (so can get number of cores on cluster; if NSLOTS not set then + gets number of cores on local machine) + +* DADA2 has new outputs: + -- DADA2_OTU_tables/Error_rate_plots/ <-- need to capture all + PDFs from this folder + +pp78-79: + +* Galaxy wrapper could check that 'Run' column is in supplied + metatable file (if it's not present then pipeline will fail + now) + +* DADA2 has its own reference database + +* DADA2 produces same outputs as Vsearch (with name changed from + "Vsearch_*" to "DADA2_*", plus extras: + -- Vsearch_OTUs.tre -> otus.tre + -- Vsearch_multiplexed_linearised_dereplicated_mc2_repset_nonchimeras_OTUS.fasta -> seqs.fa + -- There might be issues with the heatmap + +p83: notes on progress... + +p95: + +* Confirms heatmap is now e.g. RESULTS/Vsearch_silva/Heatmap.pdf + (instead of HTML output)