Mercurial > repos > sanbi-uwc > mothur_test
diff README @ 0:ee4fee239fe7 draft default tip
planemo upload commit 68a4fd4cc5332c57ac39bef73db224425af0706c-dirty
author | sanbi-uwc |
---|---|
date | Fri, 03 Jun 2016 09:32:47 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README Fri Jun 03 09:32:47 2016 -0400 @@ -0,0 +1,163 @@ +Galaxy wrappers for the Mothur metagenomics tools (http://www.mothur.org/wiki/Main_Page) + +The Mothur Tool Suite repository: + - Provides Mothur wrappers for most Mothur tools + - Data type used by mothur and other metagenomics tools + - Downloads and builds Mothur on the Linux or Mac operating system + +Requirements: + - Build utilities (make, GCC, gfortran, etc) + - simplejson (pip install simplejson) + +Repository Dependency: + - BLAST Legacy ver. 2.2.26 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/) + - The repository name should be package_blast_2_2_26 so it matches with the tool dependency. + + +Manual installation for Mothur: + Install mothur v.1.33 on your galaxy system so galaxy can execute the mothur command + ( This version of wrappers is designed for Mothur version 1.33- it may work on later versions ) + http://www.mothur.org/wiki/Download_mothur + http://www.mothur.org/wiki/Installation + ( This Galaxy Mothur wrapper will invoke Mothur in command line mode: http://www.mothur.org/wiki/Command_line_mode ) + +TreeVector is also packaged with this Mothur package to view phylogenetic trees: + TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files. + TreeVector was written by Ralph_Pethica, Department_of_Computer_Science, University_of_Bristol + TreeVector: http://supfam.cs.bris.ac.uk/TreeVector/about.html + Install in galaxy: tool-data/shared/jars/TreeVector.jar + +Install reference data from silva and greengenes + RDP reference file (modified for mothur): + http://www.mothur.org/wiki/RDP_reference_files + - 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6. + http://www.mothur.org/w/images/2/29/Trainset7_112011.rdp.zip + - 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales + http://www.mothur.org/w/images/4/4a/Trainset7_112011.pds.zip + - 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab + http://www.mothur.org/w/images/3/36/FungiLSU_train_v7.zip + Silva reference: + http://www.mothur.org/wiki/Silva_reference_files + - Bacterial references (14,956 sequences) + http://www.mothur.org/w/images/9/98/Silva.bacteria.zip + - Archaeal references (2,297 sequences) + http://www.mothur.org/w/images/3/3c/Silva.archaea.zip + - Eukaryotic references (1,238 sequences) + http://www.mothur.org/w/images/1/1a/Silva.eukarya.zip + - Silva-based alignment of template file for chimera.slayer (5,181 sequences) + http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip + Alignment database rRNA gene sequences: + http://www.mothur.org/wiki/Alignment_database + - greengenes reference alignment + http://www.mothur.org/w/images/7/72/Greengenes.alignment.zip + - SILVA (Silva reference) + http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip + Secondary structure mapping files: + http://www.mothur.org/wiki/Secondary_structure_map + http://www.mothur.org/w/images/6/6d/Silva_ss_map.zip + http://www.mothur.org/w/images/4/4b/Gg_ss_map.zip + Lane masks: + http://www.mothur.org/wiki/Lane_mask + greengenes-compatible mask: + - lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database + http://www.mothur.org/w/images/2/2a/Lane1241.gg.filter + - lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database + http://www.mothur.org/w/images/a/a0/Lane1287.gg.filter + - lane1349.gg.filter - Pat Schloss's transcription of the mask from the Lane paper + http://www.mothur.org/w/images/3/3d/Lane1349.gg.filter + SILVA-compatible mask: + - lane1349.silva.filter - Pat Schloss's transcription of the mask from the Lane paper + http://www.mothur.org/w/images/6/6d/Lane1349.silva.filter + Lookup Files for sff flow analysis using shhh.flows: + http://www.mothur.org/wiki/Alignment_database + + Example from UMN installation: (We also made these available in a Galaxy public data library) + /project/db/galaxy/mothur/Silva.bacteria.zip + /project/db/galaxy/mothur/silva.eukarya.fasta + /project/db/galaxy/mothur/Greengenes.alignment.zip + /project/db/galaxy/mothur/Silva.archaea.zip + /project/db/galaxy/mothur/Silva_ss_map.zip + /project/db/galaxy/mothur/silva.eukarya.ncbi.tax + /project/db/galaxy/mothur/Silva.gold.bacteria.zip + /project/db/galaxy/mothur/Silva.archaea/silva.archaea.silva.tax + /project/db/galaxy/mothur/Silva.archaea/silva.archaea.gg.tax + /project/db/galaxy/mothur/Silva.archaea/silva.archaea.rdp.tax + /project/db/galaxy/mothur/Silva.archaea/nogap.archaea.fasta + /project/db/galaxy/mothur/Silva.archaea/silva.archaea.ncbi.tax + /project/db/galaxy/mothur/Silva.archaea/silva.archaea.fasta + /project/db/galaxy/mothur/nogap.eukarya.fasta + /project/db/galaxy/mothur/silva.eukarya.silva.tax + /project/db/galaxy/mothur/silva.gold.align + /project/db/galaxy/mothur/silva.ss.map + /project/db/galaxy/mothur/gg.ss.map + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.silva.tax + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp6.tax + /project/db/galaxy/mothur/silva.bacteria/nogap.bacteria.fasta + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.gg.tax + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.ncbi.tax + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.fasta + /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp.tax + /project/db/galaxy/mothur/Silva.eukarya.zip + /project/db/galaxy/mothur/Gg_ss_map.zip + /project/db/galaxy/mothur/core_set_aligned.imputed.fasta + /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.fasta + /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.tax + /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.fasta + /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.tax + /project/db/galaxy/mothur/RDP/trainset7_112011.pds.fasta + /project/db/galaxy/mothur/RDP/trainset7_112011.pds.tax + /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.fasta + /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.tax + +Add tool-data: (contains pointers to silva, greengenes, and RDP reference data) + tool-data/mothur_aligndb.loc + tool-data/mothur_map.loc + tool-data/mothur_taxonomy.loc + tool-data/shared/jars/TreeVector.jar + +################################################################ +#### If you are manually adding this to your local galaxy: #### +################################################################ + +add config files (*.xml) and wrapper code (*.py) from tools/mothur/* to your galaxy installation + +add datatype definition file: lib/galaxy/datatypes/metagenomics.py + +add the following import line to: lib/galaxy/datatypes/registry.py +import metagenomics # added for metagenomics mothur + +add datatypes to: datatypes_conf.xml + +add mothur tools to: tool_conf.xml + +############ DESIGN NOTES ######################################################################################################### +Each mothur command has it's own tool_config (.xml) file, but all call the same python wrapper code: mothur_wrapper.py + + (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used be mothur commands) + +* Every mothur tool will call mothur_wrapper.py script with a --cmd= parameter that gives the mothur command name. +* Every tool will produce the logfile of the mothur run as an output. +* When the outputs of a mothur command could be determined in advance, they are included in the --result= parameter to mothur_wrapper.py +* When the number of outputs cannot be determined in advance, the name patterns and datatypes of the ouputs + are included in the --new_datasets parameter to mothur_wrapper.py + +Here is an example call to the mothur_wrapper.py script with an explanation before each param : + mothur_wrapper.py + # name of a mothur command, this is required + --cmd='summary.shared' + # Galaxy output dataset list, these are output files that can be determined before the command is run + # The items in the list are separated by commas + # Each item contains a regex to match the output filename and a galaxy dataset filepath in which to copy the data (separated by :) + --result='^mothur.\S+\.logfile$:'/home/galaxy/data/database/files/002/dataset_2613.dat,'^\S+\.summary$:'/home/galaxy/data/database/files/002/dataset_2614.dat + # Galaxy output dataset extra_files_path direcotry in which to put all output files (usually the logfile extra_file path) + --outputdir='/home/galaxy/data/database/files/002/dataset_2613_files' + # The id of one of the galaxy outputs (e.g. the mothur logfile) used for dynamic dataset generation (when number of outputs not known in advance) + # see: ttp://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput + --datasetid='2578' + # The galaxy directory in which to copy all output files for dynamic dataset generation (special galaxy tool param: $__new_file_path__) + --new_file_path='$__new_file_path__' + # specifies files to copy to the new_file_path + # The list is separated by commas + # Each item conatins: a regex pattern for matching filenames and a galaxy datatype (separated by :) + # The regex match.groups()[0] is used as the id name of the dataset, and must result in unique name for each output + --new_datasets='^\S+?\.((\S+)\.(unique|[0-9.]*)\.dist)$:lower.dist'