comparison README.rst @ 0:b433086738d6 draft

planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit ba3e5b591407db52a586361efb21927c8171ec0e
author pjbriggs
date Wed, 08 Nov 2017 08:43:02 -0500
parents
children a00f366adc45
comparison
equal deleted inserted replaced
-1:000000000000 0:b433086738d6
1 Amplicon_analysis-galaxy
2 ========================
3
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
5 script at https://github.com/MTutino/Amplicon_analysis
6
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
8 (Casava >= 1.8) and performs the following operations:
9
10 * QC and clean up of input data
11 * Removal of singletons and chimeras and building of OTU table
12 and phylogenetic tree
13 * Beta and alpha diversity of analysis
14
15 Usage documentation
16 ===================
17
18 Usage of the tool (including required inputs) is documented within
19 the ``help`` section of the tool XML.
20
21 Installing the tool in a Galaxy instance
22 ========================================
23
24 The tool is not currently hosted on a Galaxy toolshed both the tool
25 files and the dependencies must be installed manually. In addition
26 it is necessary to fetch and install the reference data.
27
28 1. Install the dependencies
29 ---------------------------
30
31 The ``install_tool_deps.sh`` script can be used to fetch and install the
32 dependencies locally, for example::
33
34 install_tool_deps.sh /path/to/local_tool_dependencies
35
36 This can take some time to complete. When finished it should have
37 created a set of directories containing the dependencies under the
38 specified top level directory.
39
40 2. Install the tool files
41 -------------------------
42
43 There are two files to install:
44
45 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
46 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
47
48 Put these in a directory that is visible to Galaxy (e.g. a
49 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
50 file to tell Galaxy to offer the tool by adding the line e.g.::
51
52 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
53
54 3. Install the reference data
55 -----------------------------
56
57 The script ``References.sh`` from the pipeline package at
58 https://github.com/MTutino/Amplicon_analysis can be run to install
59 the reference data, for example::
60
61 cd /path/to/pipeline/data
62 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
63 /bin/bash ./References.sh
64
65 will install the data in ``/path/to/pipeline/data``.
66
67 **NB** The final amount of data downloaded and uncompressed will be
68 around 6GB.
69
70 4. Configure dependencies and reference data in Galaxy
71 ------------------------------------------------------
72
73 The final steps are to make your Galaxy installation aware of the
74 tool dependencies and reference data, so it can locate them both when
75 the tool is run.
76
77 To target the tool dependencies installed previously, add the
78 following lines to the ``dependency_resolvers_conf.xml`` file in the
79 Galaxy ``config`` directory::
80
81 <dependency_resolvers>
82 ...
83 <galaxy_packages base_path="/path/to/local_tool_dependencies" />
84 <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" />
85 ...
86 </dependency_resolvers>
87
88 (NB it is recommended to place these *before* the ``<conda ... />``
89 resolvers)
90
91 (If you're not familiar with dependency resolvers in Galaxy then
92 see the documentation at
93 https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html
94 for more details.)
95
96 The tool locates the reference data via an environment variable called
97 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
98 directory where the reference data has been installed.
99
100 There are various ways to do this, depending on how your Galaxy
101 installation is configured:
102
103 * **For local instances:** add a line to set it in the
104 ``config/local_env.sh`` file of your Galaxy installation, e.g.::
105
106 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
107
108 * **For production instances:** set the value in the ``job_conf.xml``
109 configuration file, e.g.::
110
111 <destination id="amplicon_analysis">
112 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
113 </destination>
114
115 and then specify that the pipeline tool uses this destination::
116
117 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
118
119 (For more about job destinations see the Galaxy documentation at
120 https://galaxyproject.org/admin/config/jobs/#job-destinations)
121
122 5. Enable rendering of HTML outputs from pipeline
123 -------------------------------------------------
124
125 To ensure that HTML outputs are displayed correctly in Galaxy
126 (for example the Vsearch OTU table heatmaps), Galaxy needs to be
127 configured not to sanitize the outputs from the ``Amplicon_analysis``
128 tool.
129
130 Either:
131
132 * **For local instances:** set ``sanitize_all_html = False`` in
133 ``config/galaxy.ini`` (nb don't do this on production servers or
134 public instances!); or
135
136 * **For production instances:** add the ``Amplicon_analysis`` tool
137 to the display whitelist in the Galaxy instance:
138
139 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
140 ``config/galaxy.ini`` and restart Galaxy;
141 - Go to ``Admin>Manage Display Whitelist``, check the box for
142 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
143 search function to help locate it) and click on
144 ``Submit new whitelist`` to update the settings.
145
146 Additional details
147 ==================
148
149 Some other things to be aware of:
150
151 * Note that using the Silva database requires a minimum of 18Gb RAM
152
153 Known problems
154 ==============
155
156 * Only the ``VSEARCH`` pipeline in Mauro's script is currently
157 available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
158 pipelines have yet to be implemented.
159 * The images in the tool help section are not visible if the
160 tool has been installed locally, or if it has been installed in
161 a Galaxy instance which is served from a subdirectory.
162
163 These are both problems with Galaxy and not the tool, see
164 https://github.com/galaxyproject/galaxy/issues/4490 and
165 https://github.com/galaxyproject/galaxy/issues/1676
166
167 Appendix: availability of tool dependencies
168 ===========================================
169
170 The tool takes its dependencies from the underlying pipeline script (see
171 https://github.com/MTutino/Amplicon_analysis/blob/master/README.md
172 for details).
173
174 As noted above, currently the ``install_tool_deps.sh`` script can be
175 used to manually install the dependencies for a local tool install.
176
177 In principle these should also be available if the tool were installed
178 from a toolshed. However it would be preferrable in this case to get as
179 many of the dependencies as possible via the ``conda`` dependency
180 resolver.
181
182 The following are known to be available via conda, with the required
183 version:
184
185 - cutadapt 1.8.1
186 - sickle-trim 1.33
187 - bioawk 1.0
188 - fastqc 0.11.3
189 - R 3.2.0
190
191 Some dependencies are available but with the "wrong" versions:
192
193 - spades (need 3.5.0)
194 - qiime (need 1.8.0)
195 - blast (need 2.2.26)
196 - vsearch (need 1.1.3)
197
198 The following dependencies are currently unavailable:
199
200 - fasta_number (need 02jun2015)
201 - fasta-splitter (need 0.2.4)
202 - rdp_classifier (need 2.2)
203 - microbiomeutil (need r20110519)
204
205 (NB usearch 6.1.544 and 8.0.1623 are special cases which must be
206 handled outside of Galaxy's dependency management systems.)
207
208 History
209 =======
210
211 ========== ======================================================================
212 Version Changes
213 ---------- ----------------------------------------------------------------------
214 1.1.0 First official version on Galaxy toolshed.
215 1.0.6 Expand inline documentation to provide detailed usage guidance.
216 1.0.5 Updates including:
217
218 - Capture read counts from quality control as new output dataset
219 - Capture FastQC per-base quality boxplots for each sample as
220 new output dataset
221 - Add support for -l option (sliding window length for trimming)
222 - Default for -L set to "200"
223 1.0.4 Various updates:
224
225 - Additional outputs are captured when a "Categories" file is
226 supplied (alpha diversity rarefaction curves and boxplots)
227 - Sample names derived from Fastqs in a collection of pairs
228 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
229 - Input Fastqs can now be of more general ``fastq`` type
230 - Log file outputs are captured in new output dataset
231 - User can specify a "title" for the job which is copied into
232 the dataset names (to distinguish outputs from different runs)
233 - Improved detection and reporting of problems with input
234 Metatable
235 1.0.3 Take the sample names from the collection dataset names when
236 using collection as input (this is now the default input mode);
237 collect additional output dataset; disable ``usearch``-based
238 pipelines (i.e. ``UPARSE`` and ``QIIME``).
239 1.0.2 Enable support for FASTQs supplied via dataset collections and
240 fix some broken output datasets.
241 1.0.1 Initial version
242 ========== ======================================================================