Mercurial > repos > matthias > dada2_makesequencetable

<tool id="dada2_makeSequenceTable" name="dada2: makeSequenceTable" version="@DADA2_VERSION@+galaxy@WRAPPER_VERSION@">
    <description>construct a sequence table (analogous to OTU table)</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <expand macro="version_command"/>
    <command detect_errors="exit_code"><![CDATA[
    Rscript '$dada2_script'
    ]]></command>
    <configfiles>
        <configfile name="dada2_script"><![CDATA[
@READ_FOO@

library(dada2, quietly=T)
#if $plot == "yes"
library(ggplot2, quietly=T)
#end if

samples <- list()
#for $s in $samples:
    #if $len($samples) == 1
    samples <- $read_data($s)
    #else
    samples[["$s.element_identifier"]] <- $read_data($s)
    #end if
#end for
## make sequence table
seqtab <- makeSequenceTable(samples, orderBy = "$orderby")


reads.per.seqlen <- tapply(colSums(seqtab), factor(nchar(getSequences(seqtab))), sum)
df <- data.frame(length=as.numeric(names(reads.per.seqlen)), count=reads.per.seqlen)

#if $plot == "yes"
pdf( '$plot_output' )
ggplot(data=df, aes(x=length, y=count)) +
    geom_col() +
#if $filter_cond.filter_select != "no"
    geom_vline( xintercept=c($filter_cond.min-0.5, $filter_cond.max+0.5) ) +
#end if
    theme_bw()
bequiet <- dev.off()
#end if

## filter by seqlengths
#if $filter_cond.filter_select != "no"
    seqtab <- seqtab[, nchar(colnames(seqtab)) %in% seq($filter_cond.min, $filter_cond.max)]
#end if

write.table(seqtab, "$stable", quote=F, sep="\t", row.names = T, col.names = NA)
    ]]></configfile>
    </configfiles>
    <inputs>
        <param name="samples" type="data" multiple="true" format="@DADA_UNIQUES@" label="samples" />
        <param name="orderby" type="select" label="Column order">
            <option value="abundance">abundance</option>
            <option value="nsamples">nsamples</option>
        </param>
        <conditional name="filter_cond">
            <param name="filter_select" type="select" label="Filter method">
                <option value="no">No filter</option>
                <option value="minmax">Specify minimum and maximum sequence lengths</option>
            </param>
            <when value="no"/>
            <when value="minmax">
                <param name="min" type="integer" value="" label="Minimum sequence length"/>
                <param name="max" type="integer" value="" label="Maximum sequence length"/>
            </when>
        </conditional>
        <param name="plot" type="boolean" truevalue="yes" falsevalue="no" checked="true" label="plot sequence length distribution" />
    </inputs>
    <outputs>
        <data name="stable" format="dada2_sequencetable" label="${tool.name} on ${on_string}"/>
        <data name="plot_output" format="pdf" label="${tool.name} on ${on_string}: sequence length distribution">
            <filter>plot</filter>
        </data>
    </outputs>

    <help><![CDATA[
This function constructs a sequence table (analogous to an OTU table) from the provided list of
samples.

Custom Reference data sets
--------------------------

For ** taxonomy assignment ** the following is needed:

- a reference fasta data base
- a comma separated list of taxonomic ranks present in the reference data base

The reference fasta data base for taxonomic assignment (fasta or compressed fasta) needs to encode the taxonomy corresponding to each sequence in the fasta header lines in the following fashion (note, the second sequence is not assigned down to level 6):

::

    >Level1;Level2;Level3;Level4;Level5;Level6;
    ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC
    >Level1;Level2;Level3;Level4;Level5;
    CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC

The list of required taxonomic ranks could be for instance: "Kingdom,Phylum,Class,Order,Family,Genus"

The reference data base for ** species assignment ** is a fasta file (or compressed fasta file), with the id line formatted as follows:

::

    >ID Genus species
    ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC
    >ID Genus species
    CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
    ]]></help>
    <expand macro="citations"/>
</tool>
author	matthias
date	Tue, 09 Apr 2019 07:10:43 -0400
parents	98e24c66eeb2
children	c3834c230b0a