comparison downsample.xml @ 1:03aeb837e398 draft default tip

Uploaded
author dave
date Tue, 01 Oct 2019 16:25:02 -0400
parents 20823bce09e7
children
comparison
equal deleted inserted replaced
0:20823bce09e7 1:03aeb837e398
1 <?xml version="1.0"?> 1 <?xml version="1.0"?>
2 <tool id="dynamic_downsample" name="Dynamically downsample" version="1.0.0"> 2 <tool id="dynamic_downsample" name="Downsample" version="1.0.0">
3 <description>reads to desired coverage</description> 3 <description>reads to desired coverage</description>
4 <requirements> 4 <requirements>
5 <requirement type="package" version="1.9">samtools</requirement> 5 <requirement type="package" version="1.9">samtools</requirement>
6 <requirement type="package" version="5.0.1">gawk</requirement> 6 <requirement type="package" version="5.0.1">gawk</requirement>
7 </requirements> 7 </requirements>
8 <command><![CDATA[ 8 <command><![CDATA[
9 if FACTOR=\$(samtools depth '$reads' | awk '{ a[i++]=\$3; } END { x=int((i+1)/2); if (x < (i+1)/2) y=(a[x-1]+a[x])/2; else y=a[x-1]; f = 1/(y/$coverage) ; if (f >= 1) exit 1 ; else print f }') ; 9 if FACTOR=\$(samtools depth '$reads' | awk '{ readcovs[x++]=\$3; } END { n = asort(readcovs) ; idx=int((x+1)/2) ; coverage = ((idx==(x+1)/2) ? readcovs[idx] : (readcovs[idx]+readcovs[idx+1])/2) ; factor = 1/(coverage/$target_coverage) ; if (factor >= 1) exit 1 ; else print factor }') ;
10 then samtools view '$reads' -s \$FACTOR -O $reads.datatype -o '$output' ; 10 then samtools view '$reads' -s \$FACTOR -O BAM -o '$output' -@ \${GALAXY_SLOTS:-1} ;
11 else ; 11 else samtools view -O BAM '$reads' -o '$output' ;
12 cp '$reads' '$output'
13 fi 12 fi
14 ]]> 13 ]]>
15 </command> 14 </command>
16 <inputs> 15 <inputs>
17 <param name="reads" type="data" format="sam,bam" label="Reads to downsample" /> 16 <param name="reads" type="data" format="sam,bam" label="Reads to downsample" />
18 <param name="coverage" type="integer" value="1000" label="Target coverage" /> 17 <param name="target_coverage" type="integer" value="1000" label="Target coverage" />
19 </inputs> 18 </inputs>
20 <outputs> 19 <outputs>
21 <data format="bam" name="output" label="${tool.name} on ${on_string} (Downsampled to ${coverage}x coverage)"> 20 <data format="bam" name="output" label="Downsample ${on_string} to ${target_coverage}x coverage" />
22 <change_format>
23 <when input="reads" value="sam" format="sam" />
24 </change_format>
25 </data>
26 </outputs> 21 </outputs>
27 <tests> 22 <tests>
23 <test>
24 <param name="reads" ftype="bam" value="downsample-in1.bam" />
25 <param name="target_coverage" value="100" />
26 <output name="output" file="downsample-out1.bam" />
27 </test>
28 </tests> 28 </tests>
29 <help> 29 <help><![CDATA[
30 .. role:: bash(code)
31 :language: bash
32
33
34 Dynamic Downsampling
35 ~~~~~~~~~~~~~~~~~~~~
36
37 A known issue with variant analysis is that when small genomes are sequenced,
38 e.g. HIV at 9.7 kilobases or the human mitochondria at 16.6kb, the resulting
39 coverage can easily exceed 10,000x. This can cause performance issues for some
40 variant callers, especially those that employ a haplotyping approach to variant
41 detection.
42
43 This tool attempts to ameliorate that issue by downsampling its input files to
44 the target coverage using :bash:`samtools depth` to determine the median
45 coverage for a given BAM file, then running :bash:`samtools view -s` on the file
46 if 1 / (median coverage / desired coverage) is less than 1.
47
48 .. code-block:: bash
49
50 -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the fraction of templates/read pairs to keep; INT part sets seed)
51
52 The median coverage is determined by passing the :bash:`samtools depth` command
53 through the following :bash:`awk` script, where :bash:`$target_coverage` is the
54 value specified in the tool form:
55
56 .. code-block:: awk
57
58 '{ readcovs[x++]=$3; } END
59 {
60 n = asort(readcovs) ;
61 idx=int((x+1)/2) ;
62 coverage = ((idx==(x+1)/2) ? readcovs[idx] : (readcovs[idx]+readcovs[idx+1])/2) ;
63 factor = 1/(coverage/$target_coverage) ;
64 if (factor >= 1) exit 1 ;
65 else print factor
66 }'
67
68 On an exit code of 1, the tool will simply copy the input to the output without
69 altering it. If the :bash:`awk` step returns a value instead, the tool then runs
70 :bash:`samtools view -s 1 / (median coverage / desired coverage)`
71
72 ]]>
30 </help> 73 </help>
31 <citations> 74 <citations>
32 </citations> 75 </citations>
33 </tool> 76 </tool>