Mercurial > repos > sanbi-uwc > mothur_test
comparison README @ 0:ee4fee239fe7 draft default tip
planemo upload commit 68a4fd4cc5332c57ac39bef73db224425af0706c-dirty
| author | sanbi-uwc |
|---|---|
| date | Fri, 03 Jun 2016 09:32:47 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:ee4fee239fe7 |
|---|---|
| 1 Galaxy wrappers for the Mothur metagenomics tools (http://www.mothur.org/wiki/Main_Page) | |
| 2 | |
| 3 The Mothur Tool Suite repository: | |
| 4 - Provides Mothur wrappers for most Mothur tools | |
| 5 - Data type used by mothur and other metagenomics tools | |
| 6 - Downloads and builds Mothur on the Linux or Mac operating system | |
| 7 | |
| 8 Requirements: | |
| 9 - Build utilities (make, GCC, gfortran, etc) | |
| 10 - simplejson (pip install simplejson) | |
| 11 | |
| 12 Repository Dependency: | |
| 13 - BLAST Legacy ver. 2.2.26 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/) | |
| 14 - The repository name should be package_blast_2_2_26 so it matches with the tool dependency. | |
| 15 | |
| 16 | |
| 17 Manual installation for Mothur: | |
| 18 Install mothur v.1.33 on your galaxy system so galaxy can execute the mothur command | |
| 19 ( This version of wrappers is designed for Mothur version 1.33- it may work on later versions ) | |
| 20 http://www.mothur.org/wiki/Download_mothur | |
| 21 http://www.mothur.org/wiki/Installation | |
| 22 ( This Galaxy Mothur wrapper will invoke Mothur in command line mode: http://www.mothur.org/wiki/Command_line_mode ) | |
| 23 | |
| 24 TreeVector is also packaged with this Mothur package to view phylogenetic trees: | |
| 25 TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files. | |
| 26 TreeVector was written by Ralph_Pethica, Department_of_Computer_Science, University_of_Bristol | |
| 27 TreeVector: http://supfam.cs.bris.ac.uk/TreeVector/about.html | |
| 28 Install in galaxy: tool-data/shared/jars/TreeVector.jar | |
| 29 | |
| 30 Install reference data from silva and greengenes | |
| 31 RDP reference file (modified for mothur): | |
| 32 http://www.mothur.org/wiki/RDP_reference_files | |
| 33 - 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6. | |
| 34 http://www.mothur.org/w/images/2/29/Trainset7_112011.rdp.zip | |
| 35 - 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales | |
| 36 http://www.mothur.org/w/images/4/4a/Trainset7_112011.pds.zip | |
| 37 - 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab | |
| 38 http://www.mothur.org/w/images/3/36/FungiLSU_train_v7.zip | |
| 39 Silva reference: | |
| 40 http://www.mothur.org/wiki/Silva_reference_files | |
| 41 - Bacterial references (14,956 sequences) | |
| 42 http://www.mothur.org/w/images/9/98/Silva.bacteria.zip | |
| 43 - Archaeal references (2,297 sequences) | |
| 44 http://www.mothur.org/w/images/3/3c/Silva.archaea.zip | |
| 45 - Eukaryotic references (1,238 sequences) | |
| 46 http://www.mothur.org/w/images/1/1a/Silva.eukarya.zip | |
| 47 - Silva-based alignment of template file for chimera.slayer (5,181 sequences) | |
| 48 http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip | |
| 49 Alignment database rRNA gene sequences: | |
| 50 http://www.mothur.org/wiki/Alignment_database | |
| 51 - greengenes reference alignment | |
| 52 http://www.mothur.org/w/images/7/72/Greengenes.alignment.zip | |
| 53 - SILVA (Silva reference) | |
| 54 http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip | |
| 55 Secondary structure mapping files: | |
| 56 http://www.mothur.org/wiki/Secondary_structure_map | |
| 57 http://www.mothur.org/w/images/6/6d/Silva_ss_map.zip | |
| 58 http://www.mothur.org/w/images/4/4b/Gg_ss_map.zip | |
| 59 Lane masks: | |
| 60 http://www.mothur.org/wiki/Lane_mask | |
| 61 greengenes-compatible mask: | |
| 62 - lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database | |
| 63 http://www.mothur.org/w/images/2/2a/Lane1241.gg.filter | |
| 64 - lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database | |
| 65 http://www.mothur.org/w/images/a/a0/Lane1287.gg.filter | |
| 66 - lane1349.gg.filter - Pat Schloss's transcription of the mask from the Lane paper | |
| 67 http://www.mothur.org/w/images/3/3d/Lane1349.gg.filter | |
| 68 SILVA-compatible mask: | |
| 69 - lane1349.silva.filter - Pat Schloss's transcription of the mask from the Lane paper | |
| 70 http://www.mothur.org/w/images/6/6d/Lane1349.silva.filter | |
| 71 Lookup Files for sff flow analysis using shhh.flows: | |
| 72 http://www.mothur.org/wiki/Alignment_database | |
| 73 | |
| 74 Example from UMN installation: (We also made these available in a Galaxy public data library) | |
| 75 /project/db/galaxy/mothur/Silva.bacteria.zip | |
| 76 /project/db/galaxy/mothur/silva.eukarya.fasta | |
| 77 /project/db/galaxy/mothur/Greengenes.alignment.zip | |
| 78 /project/db/galaxy/mothur/Silva.archaea.zip | |
| 79 /project/db/galaxy/mothur/Silva_ss_map.zip | |
| 80 /project/db/galaxy/mothur/silva.eukarya.ncbi.tax | |
| 81 /project/db/galaxy/mothur/Silva.gold.bacteria.zip | |
| 82 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.silva.tax | |
| 83 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.gg.tax | |
| 84 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.rdp.tax | |
| 85 /project/db/galaxy/mothur/Silva.archaea/nogap.archaea.fasta | |
| 86 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.ncbi.tax | |
| 87 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.fasta | |
| 88 /project/db/galaxy/mothur/nogap.eukarya.fasta | |
| 89 /project/db/galaxy/mothur/silva.eukarya.silva.tax | |
| 90 /project/db/galaxy/mothur/silva.gold.align | |
| 91 /project/db/galaxy/mothur/silva.ss.map | |
| 92 /project/db/galaxy/mothur/gg.ss.map | |
| 93 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.silva.tax | |
| 94 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp6.tax | |
| 95 /project/db/galaxy/mothur/silva.bacteria/nogap.bacteria.fasta | |
| 96 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.gg.tax | |
| 97 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.ncbi.tax | |
| 98 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.fasta | |
| 99 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp.tax | |
| 100 /project/db/galaxy/mothur/Silva.eukarya.zip | |
| 101 /project/db/galaxy/mothur/Gg_ss_map.zip | |
| 102 /project/db/galaxy/mothur/core_set_aligned.imputed.fasta | |
| 103 /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.fasta | |
| 104 /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.tax | |
| 105 /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.fasta | |
| 106 /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.tax | |
| 107 /project/db/galaxy/mothur/RDP/trainset7_112011.pds.fasta | |
| 108 /project/db/galaxy/mothur/RDP/trainset7_112011.pds.tax | |
| 109 /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.fasta | |
| 110 /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.tax | |
| 111 | |
| 112 Add tool-data: (contains pointers to silva, greengenes, and RDP reference data) | |
| 113 tool-data/mothur_aligndb.loc | |
| 114 tool-data/mothur_map.loc | |
| 115 tool-data/mothur_taxonomy.loc | |
| 116 tool-data/shared/jars/TreeVector.jar | |
| 117 | |
| 118 ################################################################ | |
| 119 #### If you are manually adding this to your local galaxy: #### | |
| 120 ################################################################ | |
| 121 | |
| 122 add config files (*.xml) and wrapper code (*.py) from tools/mothur/* to your galaxy installation | |
| 123 | |
| 124 add datatype definition file: lib/galaxy/datatypes/metagenomics.py | |
| 125 | |
| 126 add the following import line to: lib/galaxy/datatypes/registry.py | |
| 127 import metagenomics # added for metagenomics mothur | |
| 128 | |
| 129 add datatypes to: datatypes_conf.xml | |
| 130 | |
| 131 add mothur tools to: tool_conf.xml | |
| 132 | |
| 133 ############ DESIGN NOTES ######################################################################################################### | |
| 134 Each mothur command has it's own tool_config (.xml) file, but all call the same python wrapper code: mothur_wrapper.py | |
| 135 | |
| 136 (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used be mothur commands) | |
| 137 | |
| 138 * Every mothur tool will call mothur_wrapper.py script with a --cmd= parameter that gives the mothur command name. | |
| 139 * Every tool will produce the logfile of the mothur run as an output. | |
| 140 * When the outputs of a mothur command could be determined in advance, they are included in the --result= parameter to mothur_wrapper.py | |
| 141 * When the number of outputs cannot be determined in advance, the name patterns and datatypes of the ouputs | |
| 142 are included in the --new_datasets parameter to mothur_wrapper.py | |
| 143 | |
| 144 Here is an example call to the mothur_wrapper.py script with an explanation before each param : | |
| 145 mothur_wrapper.py | |
| 146 # name of a mothur command, this is required | |
| 147 --cmd='summary.shared' | |
| 148 # Galaxy output dataset list, these are output files that can be determined before the command is run | |
| 149 # The items in the list are separated by commas | |
| 150 # Each item contains a regex to match the output filename and a galaxy dataset filepath in which to copy the data (separated by :) | |
| 151 --result='^mothur.\S+\.logfile$:'/home/galaxy/data/database/files/002/dataset_2613.dat,'^\S+\.summary$:'/home/galaxy/data/database/files/002/dataset_2614.dat | |
| 152 # Galaxy output dataset extra_files_path direcotry in which to put all output files (usually the logfile extra_file path) | |
| 153 --outputdir='/home/galaxy/data/database/files/002/dataset_2613_files' | |
| 154 # The id of one of the galaxy outputs (e.g. the mothur logfile) used for dynamic dataset generation (when number of outputs not known in advance) | |
| 155 # see: ttp://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput | |
| 156 --datasetid='2578' | |
| 157 # The galaxy directory in which to copy all output files for dynamic dataset generation (special galaxy tool param: $__new_file_path__) | |
| 158 --new_file_path='$__new_file_path__' | |
| 159 # specifies files to copy to the new_file_path | |
| 160 # The list is separated by commas | |
| 161 # Each item conatins: a regex pattern for matching filenames and a galaxy datatype (separated by :) | |
| 162 # The regex match.groups()[0] is used as the id name of the dataset, and must result in unique name for each output | |
| 163 --new_datasets='^\S+?\.((\S+)\.(unique|[0-9.]*)\.dist)$:lower.dist' |
