comparison rgPicardHsMetrics.xml @ 0:ff4ec13e496e draft

Uploaded tarball to repository
author devteam
date Tue, 23 Oct 2012 10:49:35 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:ff4ec13e496e
1 <tool name="SAM/BAM Hybrid Selection Metrics" id="PicardHsMetrics" version="1.56.0">
2 <description>for targeted resequencing data</description>
3 <command interpreter="python">
4
5 picard_wrapper.py -i "$input_file" -d "$html_file.files_path" -t "$html_file" --datatype "$input_file.ext"
6 --baitbed "$bait_bed" --targetbed "$target_bed" -n "$out_prefix" --tmpdir "${__new_file_path__}"
7 -j "\$JAVA_JAR_PATH/CalculateHsMetrics.jar"
8
9 </command>
10 <requirements><requirement type="package" version="1.56.0">picard</requirement></requirements>
11 <inputs>
12 <param format="sam,bam" name="input_file" type="data" label="SAM/BAM dataset to generate statistics for" />
13 <param name="out_prefix" value="Picard HS Metrics" type="text" label="Title for the output file" help="Use to remind you what the job was for." size="80" />
14 <param name="bait_bed" type="data" format="bed,interval" label="Bait intervals: Sequences for bait in the design" help="Note specific format requirements below!" size="80" />
15 <param name="target_bed" type="data" format="bed,interval" label="Target intervals: Sequences for targets in the design" help="Note specific format requirements below!" size="80" />
16 <!--
17
18 Users can be enabled to set Java heap size by uncommenting this option and adding '-x "$maxheap"' to the <command> tag.
19 If commented out the heapsize defaults to the value specified within picard_wrapper.py
20
21 <param name="maxheap" type="select"
22 help="If in doubt, try the default. If it fails with a complaint about java heap size, try increasing it please - larger jobs will require your own hardware."
23 label="Java heap size">
24 <option value="4G" selected = "true">4GB default </option>
25 <option value="8G" >8GB use if 4GB fails</option>
26 <option value="16G">16GB - try this if 8GB fails</option>
27 </param>
28
29 -->
30 </inputs>
31 <outputs>
32 <data format="html" name="html_file" label="${out_prefix}.html" />
33 </outputs>
34 <tests>
35 <test>
36 <!-- Uncomment this if maxheap parameter is enabled
37 <param name="maxheap" value="8G" />
38 -->
39 <param name="out_prefix" value="HSMetrics" />
40 <param name="input_file" value="picard_input_summary_alignment_stats.sam" ftype="sam" />
41 <param name="bait_bed" value="picard_input_bait.bed" />
42 <param name="target_bed" value="picard_input_bait.bed" />
43 <output name="html_file" file="picard_output_hs_transposed_summary_alignment_stats.html" ftype="html" lines_diff="212"/>
44 </test>
45 </tests>
46 <help>
47
48 .. class:: infomark
49
50 **Summary**
51
52 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file.
53
54 .. class:: warnmark
55
56 **WARNING about bait and target files**
57
58 Picard is very fussy about the bait and target file format. If these are not exactly right, it will fail with an error something like:
59
60 Exception in thread "main" net.sf.picard.PicardException: Invalid interval record contains 6 fields: chr1 45787123 45787316 CASO_22G_25063 1000 +
61
62 If you see an error like that from this tool, please do NOT report it to any of the Galaxy mailing lists as it is not a bug!
63 It means you must reformat your bait and target files. Galaxy cannot do that for you automatically unfortunately.
64
65 The required definition is described in the documentation at http://www.broadinstitute.org/gsa/wiki/index.php/Built-in_command-line_arguments
66 and the sample provided looks like this:
67
68 chr1 1104841 1104940 + target_1
69 chr1 1105283 1105599 + target_2
70 chr1 1105712 1105860 + target_3
71 chr1 1105960 1106119 + target_4
72
73 So your bait and target files MUST have 5 columns with chr, start, end, strand and name tab delimited and in exactly that order.
74 Note that the Picard mandated sam header described in the documentation linked above is automagically added by the tool in Galaxy.
75
76 .. class:: infomark
77
78 **Picard documentation**
79
80 This is a Galaxy wrapper for CalculateHsMetrics.jar, a part of the external package Picard-tools_.
81
82 .. _Picard-tools: http://www.google.com/search?q=picard+samtools
83
84 -----
85
86 .. class:: infomark
87
88 **Inputs, outputs, and parameters**
89
90 Picard documentation says (reformatted for Galaxy):
91
92 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file.
93
94 .. csv-table::
95 :header-rows: 1
96
97 "Option", "Description"
98 "BAIT_INTERVALS=File","An interval list file that contains the locations of the baits used. Required."
99 "TARGET_INTERVALS=File","An interval list file that contains the locations of the targets. Required."
100 "INPUT=File","An aligned SAM or BAM file. Required."
101 "OUTPUT=File","The output file to write the metrics to. Required. Cannot be used in conjuction with option(s) METRICS_FILE (M)"
102 "METRICS_FILE=File","Legacy synonym for OUTPUT, should not be used. Required. Cannot be used in conjuction with option(s) OUTPUT (O)"
103 "CREATE_MD5_FILE=Boolean","Whether to create an MD5 digest for any BAM files created. Default value: false"
104
105 HsMetrics
106
107 The set of metrics captured that are specific to a hybrid selection analysis.
108
109 Output Column Definitions::
110
111 1. BAIT_SET: The name of the bait set used in the hybrid selection.
112 2. GENOME_SIZE: The number of bases in the reference genome used for alignment.
113 3. BAIT_TERRITORY: The number of bases which have one or more baits on top of them.
114 4. TARGET_TERRITORY: The unique number of target bases in the experiment where target is usually exons etc.
115 5. BAIT_DESIGN_EFFICIENCY: Target terrirtoy / bait territory. 1 == perfectly efficient, 0.5 = half of baited bases are not target.
116 6. TOTAL_READS: The total number of reads in the SAM or BAM file examine.
117 7. PF_READS: The number of reads that pass the vendor's filter.
118 8. PF_UNIQUE_READS: The number of PF reads that are not marked as duplicates.
119 9. PCT_PF_READS: PF reads / total reads. The percent of reads passing filter.
120 10. PCT_PF_UQ_READS: PF Unique Reads / Total Reads.
121 11. PF_UQ_READS_ALIGNED: The number of PF unique reads that are aligned with mapping score > 0 to the reference genome.
122 12. PCT_PF_UQ_READS_ALIGNED: PF Reads Aligned / PF Reads.
123 13. PF_UQ_BASES_ALIGNED: The number of bases in the PF aligned reads that are mapped to a reference base. Accounts for clipping and gaps.
124 14. ON_BAIT_BASES: The number of PF aligned bases that mapped to a baited region of the genome.
125 15. NEAR_BAIT_BASES: The number of PF aligned bases that mapped to within a fixed interval of a baited region, but not on a baited region.
126 16. OFF_BAIT_BASES: The number of PF aligned bases that mapped to neither on or near a bait.
127 17. ON_TARGET_BASES: The number of PF aligned bases that mapped to a targetted region of the genome.
128 18. PCT_SELECTED_BASES: On+Near Bait Bases / PF Bases Aligned.
129 19. PCT_OFF_BAIT: The percentage of aligned PF bases that mapped neither on or near a bait.
130 20. ON_BAIT_VS_SELECTED: The percentage of on+near bait bases that are on as opposed to near.
131 21. MEAN_BAIT_COVERAGE: The mean coverage of all baits in the experiment.
132 22. MEAN_TARGET_COVERAGE: The mean coverage of targets that recieved at least coverage depth = 2 at one base.
133 23. PCT_USABLE_BASES_ON_BAIT: The number of aligned, de-duped, on-bait bases out of the PF bases available.
134 24. PCT_USABLE_BASES_ON_TARGET: The number of aligned, de-duped, on-target bases out of the PF bases available.
135 25. FOLD_ENRICHMENT: The fold by which the baited region has been amplified above genomic background.
136 26. ZERO_CVG_TARGETS_PCT: The number of targets that did not reach coverage=2 over any base.
137 27. FOLD_80_BASE_PENALTY: The fold over-coverage necessary to raise 80% of bases in "non-zero-cvg" targets to the mean coverage level in those targets.
138 28. PCT_TARGET_BASES_2X: The percentage of ALL target bases acheiving 2X or greater coverage.
139 29. PCT_TARGET_BASES_10X: The percentage of ALL target bases acheiving 10X or greater coverage.
140 30. PCT_TARGET_BASES_20X: The percentage of ALL target bases acheiving 20X or greater coverage.
141 31. PCT_TARGET_BASES_30X: The percentage of ALL target bases acheiving 30X or greater coverage.
142 32. HS_LIBRARY_SIZE: The estimated number of unique molecules in the selected part of the library.
143 33. HS_PENALTY_10X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 10X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 10 * HS_PENALTY_10X.
144 34. HS_PENALTY_20X: The "hybrid selection penalty" incurred to get 80% of target bases to 20X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 20X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 20 * HS_PENALTY_20X.
145 35. HS_PENALTY_30X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 30X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 30 * HS_PENALTY_30X.
146
147 .. class:: warningmark
148
149 **Warning on SAM/BAM quality**
150
151 Many SAM/BAM files produced externally and uploaded to Galaxy do not fully conform to SAM/BAM specifications. Galaxy deals with this by using the **LENIENT**
152 flag when it runs Picard, which allows reads to be discarded if they're empty or don't map. This appears to be the only way to deal with SAM/BAM that cannot be parsed.
153
154
155 </help>
156 </tool>