comparison README @ 0:ee4fee239fe7 draft default tip

planemo upload commit 68a4fd4cc5332c57ac39bef73db224425af0706c-dirty
author sanbi-uwc
date Fri, 03 Jun 2016 09:32:47 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:ee4fee239fe7
1 Galaxy wrappers for the Mothur metagenomics tools (http://www.mothur.org/wiki/Main_Page)
2
3 The Mothur Tool Suite repository:
4 - Provides Mothur wrappers for most Mothur tools
5 - Data type used by mothur and other metagenomics tools
6 - Downloads and builds Mothur on the Linux or Mac operating system
7
8 Requirements:
9 - Build utilities (make, GCC, gfortran, etc)
10 - simplejson (pip install simplejson)
11
12 Repository Dependency:
13 - BLAST Legacy ver. 2.2.26 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/)
14 - The repository name should be package_blast_2_2_26 so it matches with the tool dependency.
15
16
17 Manual installation for Mothur:
18 Install mothur v.1.33 on your galaxy system so galaxy can execute the mothur command
19 ( This version of wrappers is designed for Mothur version 1.33- it may work on later versions )
20 http://www.mothur.org/wiki/Download_mothur
21 http://www.mothur.org/wiki/Installation
22 ( This Galaxy Mothur wrapper will invoke Mothur in command line mode: http://www.mothur.org/wiki/Command_line_mode )
23
24 TreeVector is also packaged with this Mothur package to view phylogenetic trees:
25 TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files.
26 TreeVector was written by Ralph_Pethica, Department_of_Computer_Science, University_of_Bristol
27 TreeVector: http://supfam.cs.bris.ac.uk/TreeVector/about.html
28 Install in galaxy: tool-data/shared/jars/TreeVector.jar
29
30 Install reference data from silva and greengenes
31 RDP reference file (modified for mothur):
32 http://www.mothur.org/wiki/RDP_reference_files
33 - 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6.
34 http://www.mothur.org/w/images/2/29/Trainset7_112011.rdp.zip
35 - 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales
36 http://www.mothur.org/w/images/4/4a/Trainset7_112011.pds.zip
37 - 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab
38 http://www.mothur.org/w/images/3/36/FungiLSU_train_v7.zip
39 Silva reference:
40 http://www.mothur.org/wiki/Silva_reference_files
41 - Bacterial references (14,956 sequences)
42 http://www.mothur.org/w/images/9/98/Silva.bacteria.zip
43 - Archaeal references (2,297 sequences)
44 http://www.mothur.org/w/images/3/3c/Silva.archaea.zip
45 - Eukaryotic references (1,238 sequences)
46 http://www.mothur.org/w/images/1/1a/Silva.eukarya.zip
47 - Silva-based alignment of template file for chimera.slayer (5,181 sequences)
48 http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip
49 Alignment database rRNA gene sequences:
50 http://www.mothur.org/wiki/Alignment_database
51 - greengenes reference alignment
52 http://www.mothur.org/w/images/7/72/Greengenes.alignment.zip
53 - SILVA (Silva reference)
54 http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip
55 Secondary structure mapping files:
56 http://www.mothur.org/wiki/Secondary_structure_map
57 http://www.mothur.org/w/images/6/6d/Silva_ss_map.zip
58 http://www.mothur.org/w/images/4/4b/Gg_ss_map.zip
59 Lane masks:
60 http://www.mothur.org/wiki/Lane_mask
61 greengenes-compatible mask:
62 - lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database
63 http://www.mothur.org/w/images/2/2a/Lane1241.gg.filter
64 - lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database
65 http://www.mothur.org/w/images/a/a0/Lane1287.gg.filter
66 - lane1349.gg.filter - Pat Schloss's transcription of the mask from the Lane paper
67 http://www.mothur.org/w/images/3/3d/Lane1349.gg.filter
68 SILVA-compatible mask:
69 - lane1349.silva.filter - Pat Schloss's transcription of the mask from the Lane paper
70 http://www.mothur.org/w/images/6/6d/Lane1349.silva.filter
71 Lookup Files for sff flow analysis using shhh.flows:
72 http://www.mothur.org/wiki/Alignment_database
73
74 Example from UMN installation: (We also made these available in a Galaxy public data library)
75 /project/db/galaxy/mothur/Silva.bacteria.zip
76 /project/db/galaxy/mothur/silva.eukarya.fasta
77 /project/db/galaxy/mothur/Greengenes.alignment.zip
78 /project/db/galaxy/mothur/Silva.archaea.zip
79 /project/db/galaxy/mothur/Silva_ss_map.zip
80 /project/db/galaxy/mothur/silva.eukarya.ncbi.tax
81 /project/db/galaxy/mothur/Silva.gold.bacteria.zip
82 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.silva.tax
83 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.gg.tax
84 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.rdp.tax
85 /project/db/galaxy/mothur/Silva.archaea/nogap.archaea.fasta
86 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.ncbi.tax
87 /project/db/galaxy/mothur/Silva.archaea/silva.archaea.fasta
88 /project/db/galaxy/mothur/nogap.eukarya.fasta
89 /project/db/galaxy/mothur/silva.eukarya.silva.tax
90 /project/db/galaxy/mothur/silva.gold.align
91 /project/db/galaxy/mothur/silva.ss.map
92 /project/db/galaxy/mothur/gg.ss.map
93 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.silva.tax
94 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp6.tax
95 /project/db/galaxy/mothur/silva.bacteria/nogap.bacteria.fasta
96 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.gg.tax
97 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.ncbi.tax
98 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.fasta
99 /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp.tax
100 /project/db/galaxy/mothur/Silva.eukarya.zip
101 /project/db/galaxy/mothur/Gg_ss_map.zip
102 /project/db/galaxy/mothur/core_set_aligned.imputed.fasta
103 /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.fasta
104 /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.tax
105 /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.fasta
106 /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.tax
107 /project/db/galaxy/mothur/RDP/trainset7_112011.pds.fasta
108 /project/db/galaxy/mothur/RDP/trainset7_112011.pds.tax
109 /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.fasta
110 /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.tax
111
112 Add tool-data: (contains pointers to silva, greengenes, and RDP reference data)
113 tool-data/mothur_aligndb.loc
114 tool-data/mothur_map.loc
115 tool-data/mothur_taxonomy.loc
116 tool-data/shared/jars/TreeVector.jar
117
118 ################################################################
119 #### If you are manually adding this to your local galaxy: ####
120 ################################################################
121
122 add config files (*.xml) and wrapper code (*.py) from tools/mothur/* to your galaxy installation
123
124 add datatype definition file: lib/galaxy/datatypes/metagenomics.py
125
126 add the following import line to: lib/galaxy/datatypes/registry.py
127 import metagenomics # added for metagenomics mothur
128
129 add datatypes to: datatypes_conf.xml
130
131 add mothur tools to: tool_conf.xml
132
133 ############ DESIGN NOTES #########################################################################################################
134 Each mothur command has it's own tool_config (.xml) file, but all call the same python wrapper code: mothur_wrapper.py
135
136 (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used be mothur commands)
137
138 * Every mothur tool will call mothur_wrapper.py script with a --cmd= parameter that gives the mothur command name.
139 * Every tool will produce the logfile of the mothur run as an output.
140 * When the outputs of a mothur command could be determined in advance, they are included in the --result= parameter to mothur_wrapper.py
141 * When the number of outputs cannot be determined in advance, the name patterns and datatypes of the ouputs
142 are included in the --new_datasets parameter to mothur_wrapper.py
143
144 Here is an example call to the mothur_wrapper.py script with an explanation before each param :
145 mothur_wrapper.py
146 # name of a mothur command, this is required
147 --cmd='summary.shared'
148 # Galaxy output dataset list, these are output files that can be determined before the command is run
149 # The items in the list are separated by commas
150 # Each item contains a regex to match the output filename and a galaxy dataset filepath in which to copy the data (separated by :)
151 --result='^mothur.\S+\.logfile$:'/home/galaxy/data/database/files/002/dataset_2613.dat,'^\S+\.summary$:'/home/galaxy/data/database/files/002/dataset_2614.dat
152 # Galaxy output dataset extra_files_path direcotry in which to put all output files (usually the logfile extra_file path)
153 --outputdir='/home/galaxy/data/database/files/002/dataset_2613_files'
154 # The id of one of the galaxy outputs (e.g. the mothur logfile) used for dynamic dataset generation (when number of outputs not known in advance)
155 # see: ttp://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput
156 --datasetid='2578'
157 # The galaxy directory in which to copy all output files for dynamic dataset generation (special galaxy tool param: $__new_file_path__)
158 --new_file_path='$__new_file_path__'
159 # specifies files to copy to the new_file_path
160 # The list is separated by commas
161 # Each item conatins: a regex pattern for matching filenames and a galaxy datatype (separated by :)
162 # The regex match.groups()[0] is used as the id name of the dataset, and must result in unique name for each output
163 --new_datasets='^\S+?\.((\S+)\.(unique|[0-9.]*)\.dist)$:lower.dist'