# HG changeset patch # User nick # Date 1448317688 18000 # Node ID 4633a25d8c19e033c7fafb0325f1997b4d8537dd planemo upload commit 801bf168032a13f6405518bddb35a24c9e9a8cd4-dirty diff -r 000000000000 -r 4633a25d8c19 align_families.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/align_families.xml Mon Nov 23 17:28:08 2015 -0500 @@ -0,0 +1,59 @@ + + + from duplex sequencing data + align_families.py $input > $output + + + + + + + + + mafft + duplex + + + + + + + + + +**What it does** + +This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads. + +----- + +**Input** + +This expects the output format of the "Make families" tool. + +----- + +**Output** + +The output is a tabular file where each line corresponds to a (single) read. + +The columns are:: + + 1: barcode (both tags) + 2: tag order in barcode ("ab" or "ba") + 3: read mate ("1" or "2") + 4: read name + 5: read sequence, aligned ("-" for gaps) + 6: read quality scores, aligned (" " for gaps) + +----- + +**Alignments** + +The alignments are done using MAFFT, specifically the command +:: + + $ mafft --nuc --quiet family.fa > family.aligned.fa + + + diff -r 000000000000 -r 4633a25d8c19 duplex.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/duplex.xml Mon Nov 23 17:28:08 2015 -0500 @@ -0,0 +1,50 @@ + + + from duplex sequencing data + duplex.fa + && awk -f $__tool_directory__/utils/outconv.awk -v target=1 duplex.fa > $output1 + && awk -f $__tool_directory__/utils/outconv.awk -v target=2 duplex.fa > $output2 + ]]> + + + + + + + + + + + + + + + + keep_sscs + + + + +**What it does** + +This is for processing duplex sequencing data. It creates single-strand and duplex consensus reads from aligned read families. + +----- + +**Input** + +This expects the output format of the "Align families" tool. + +----- + +**Output** + +This will output final, duplex consensus reads in two FASTA files (first and second reads in the pairs). Optionally, you can save the single-strand reads too, in a separate FASTA file. + + + diff -r 000000000000 -r 4633a25d8c19 make_families.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/make_families.xml Mon Nov 23 17:28:08 2015 -0500 @@ -0,0 +1,79 @@ + + + from duplex sequencing data + paste $fastq1 $fastq2 + | paste - - - - + | awk -f $__tool_directory__/make-barcodes.awk -v TAG_LEN=$taglen -v INVARIANT=$invariant + | sort + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool is for processing raw duplex sequencing data, removing the barcodes and grouping by them into families of reads from the same fragment. + +----- + +**Output** + +The output will be a tabular file where each line corresponds to a pair of input reads. + +The columns are:: + + 1: barcode (both tags joined and ordered) + 2: tag order in barcode ("ab" or "ba") + 3: read1 name + 4: read1 sequence (minus the tag and invariant sequences) + 5: read1 quality scores (minus the same tag and invariant) + 6: read2 name + 7: read2 sequence (minus the tag and invariant sequences) + 8: read2 quality scores (minus the same tag and invariant) + +----- + +**Barcode creation** + +For each pair, the tool will remove the tag at the beginning of each read and create a barcode by concatenating the two tags. The order of the tags is determined by a string comparison so that it will make an identical barcode from pairs of either order. The original tag order will be noted in the second column. + +Since pairs from opposite strands will have the same tags, but in the reverse order, this produces the same barcode for reads from the same fragment, regardless of strand. Then a simple sort will group all reads from the same strand together, separated into strands by the different "order" values. + +Examples:: + + +---------------+-----------------+ + | input tags | output | + +-------+-------+-------+---------+ + | read1 | read2 | order | barcode | + +-------+-------+-------+---------+ + | ATG | CCT | ab | ATGCCT | + +-------+-------+-------+---------+ + | CCT | ATG | ba | ATGCCT | + +-------+-------+-------+---------+ + + + diff -r 000000000000 -r 4633a25d8c19 tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Mon Nov 23 17:28:08 2015 -0500 @@ -0,0 +1,18 @@ + + + + + + + + + https://github.com/makrutenko/duplex/archive/master.tar.gz + duplex-master + make + + $INSTALL_DIR + + + + +