# HG changeset patch # User boris # Date 1334949811 14400 # Node ID 5108a91b93fefe0fa00df2a9f5db938359a0d448 # Parent 20b654fe58e26c05ddf4be9f35df2a7b159370e1 Uploaded diff -r 20b654fe58e2 -r 5108a91b93fe MDtag_filter.xml --- a/MDtag_filter.xml Fri Apr 20 15:23:12 2012 -0400 +++ b/MDtag_filter.xml Fri Apr 20 15:23:31 2012 -0400 @@ -1,13 +1,28 @@ on MD tag string - MDtag_filter.py $in_sam $n $m $out_sam + MDtag_filter.py $in_sam $n $m $out_sam ${create.choice} $discarded_sam - - + + + + + + + + + + + + + + - + + + (create['choice']=='yes') + @@ -16,11 +31,20 @@ + + + + + + + + -Mismatches at the start and end of a mapped read are most likely sequencing errors. -This tool aims to control the variation noise due to sequencing errors. + +Mismatches at either end of a mapped read are most likely sequencing errors. +This tool aims to control the variation noise due to potential sequencing errors. ----- @@ -28,8 +52,10 @@ **What it does** -This tool reads the MD tag string of mapped reads. It discards mapped reads that contain variation at either end. -The user defines n and m. The mapped read is discarded if it contains any number of mismatches within **n** bases of the read start and within **m** bases of the read end. +This tool reads the MD tag of mapped reads (see SAM format specification). The user defines the 5' and 3' windows **n** and **m** (in bp), respectively. +The mapped read is discarded if it contains any number of mismatches within **n** bases of the read 5' end and within **m** bases of the read 3' end. +The resulting SAM file is enriched for mapped reads that show internal variation (if any) over reads whose variation is found within the read ends. +The user might also want to keep the discarded reads in an additional file. ----- @@ -37,7 +63,7 @@ **Note** -Mapped reads without an MD tag will be removed from the output SAM file. +Mapped reads without an MD tag will be removed from the output SAM file(s). -----