Mercurial > repos > matthias > dada2_derepfastq

<tool id="dada2_derepFastq" name="dada2: derepFastq" version="@DADA2_VERSION@+galaxy@WRAPPER_VERSION@" profile="19.09">
    <description>dereplicate amplicon sequences</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <expand macro="stdio"/>
    <expand macro="version_command"/>
    <command detect_errors="exit_code"><![CDATA[
    Rscript '$dada2_script'
    ]]></command>
    <configfiles>
        <configfile name="dada2_script"><![CDATA[
library(dada2, quietly=T)
derep <- derepFastq('$fls')

## write.table(derep\$uniques, file = '$derep', quote = F, sep = "\t", row.names = T, col.names = F)
saveRDS(derep, file='$derep')
    ]]></configfile>
    </configfiles>
    <inputs>
        <param argument="fls" type="data" format="fastq,fastq.gz" label="Short read data" />
    </inputs>
    <outputs>
        <data name="derep" format="dada2_derep" label="${tool.name} on ${on_string}"/>
    </outputs>
    <tests>
        <test>
            <param name="fls" value="filterAndTrim_F3D0_R1.fq.gz" ftype="fastqsanger.gz" />
            <output name="derep" value="derepFastq_F3D0_R1.Rdata" ftype="dada2_derep" />
        </test>
        <!-- test for creating input for dada results for reverse, not needed for testing -->
        <test>
            <param name="fls" value="filterAndTrim_F3D0_R2.fq.gz" ftype="fastqsanger.gz" />
            <output name="derep" value="derepFastq_F3D0_R2.Rdata" ftype="dada2_derep" />
        </test>
    </tests>
    <help><![CDATA[
Description
...........

Dereplication combines all identical sequencing reads into into “unique sequences” with a corresponding “abundance” equal to the number of reads with that unique sequence. Dereplication substantially reduces computation time of the subsequent steps by eliminating redundant comparisons.

Usage
.....

**Input** is a FASTQ dataset containing the filtered and trimmed reads of a sample.

**Output** a dataset with type *dada2_derep* (which is a RData file containing the output of dada2's derepFastq function).

The output can be used as input for the *dada2: dada* tool which infers the sample composition from the dereplicated sequences given an error model.

Details
.......

Dereplication in the DADA2 pipeline has one crucial addition from other pipelines: DADA2 retains a summary of the quality information associated with each unique sequence. The consensus quality profile of a unique sequence is the average of the positional qualities from the dereplicated reads. These quality profiles inform the error model of the subsequent sample inference step, significantly increasing DADA2’s accuracy.

@HELP_OVERVIEW@
    ]]></help>
    <expand macro="citations"/>
</tool>
author	matthias
date	Tue, 15 Oct 2019 07:25:03 -0400
parents	4861220ec0c9
children