Mercurial > repos > matthias > dada2_makesequencetable

diff dada2_makeSequenceTable.xml @ 3:c3834c230b0a draft
planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools/tree/topic/dada2/tools/dada2 commit 5b1603bbcd3f139cad5c876be83fcb39697b5613-dirty
author: matthias
date: Mon, 29 Apr 2019 09:00:48 -0400
parents: d2e7c5f8a9f7
children: ec4a183cc713
--- a/dada2_makeSequenceTable.xml	Tue Apr 09 07:10:43 2019 -0400
+++ b/dada2_makeSequenceTable.xml	Mon Apr 29 09:00:48 2019 -0400
@@ -20,13 +20,13 @@
 samples <- list()
 #for $s in $samples:
     #if $len($samples) == 1
-    samples <- $read_data($s)
+    samples <- readRDS('$s')
     #else
-    samples[["$s.element_identifier"]] <- $read_data($s)
+    samples[["$s.element_identifier"]] <- readRDS('$s')
     #end if
 #end for
 ## make sequence table
-seqtab <- makeSequenceTable(samples, orderBy = "$orderby")
+seqtab <- makeSequenceTable(samples, orderBy = "$orderBy")
 
 
 reads.per.seqlen <- tapply(colSums(seqtab), factor(nchar(getSequences(seqtab))), sum)
@@ -34,10 +34,10 @@
 
 #if $plot == "yes"
 pdf( '$plot_output' )
-ggplot(data=df, aes(x=length, y=count)) + 
-    geom_col() + 
+ggplot(data=df, aes(x=length, y=count)) +
+    geom_col() +
 #if $filter_cond.filter_select != "no"
-    geom_vline( xintercept=c($filter_cond.min-0.5, $filter_cond.max+0.5) ) + 
+    geom_vline( xintercept=c($filter_cond.min-0.5, $filter_cond.max+0.5) ) +
 #end if
     theme_bw()
 bequiet <- dev.off()
@@ -53,12 +53,12 @@
     </configfiles>
     <inputs>
         <param name="samples" type="data" multiple="true" format="@DADA_UNIQUES@" label="samples" />
-        <param name="orderby" type="select" label="Column order">
+        <param argument="orderBy" type="select" label="Column order">
             <option value="abundance">abundance</option>
             <option value="nsamples">nsamples</option>
         </param>
         <conditional name="filter_cond">
-            <param name="filter_select" type="select" label="Filter method">
+            <param name="filter_select" type="select" label="Length filter method">
                 <option value="no">No filter</option>
                 <option value="minmax">Specify minimum and maximum sequence lengths</option>
             </param>
@@ -76,38 +76,33 @@
             <filter>plot</filter>
         </data>
     </outputs>
-
+    <tests>
+        <test>
+            <param name="samples" ftype="dada2_mergepairs" value="mergePairs_F3D0.Rdata"/>
+            <output name="stable" value="makeSequenceTable_F3D0.tab" ftype="dada2_sequencetable" />
+        </test>
+    </tests>
     <help><![CDATA[
-This function constructs a sequence table (analogous to an OTU table) from the provided list of
-samples.
+Description
+...........
 
-Custom Reference data sets
---------------------------
-
-For ** taxonomy assignment ** the following is needed: 
+This function constructs a sequence table -- more precisely an amplicon sequence variant table (ASV) table -- a higher-resolution version of the OTU table produced by traditional methods.
 
-- a reference fasta data base 
-- a comma separated list of taxonomic ranks present in the reference data base 
-
-The reference fasta data base for taxonomic assignment (fasta or compressed fasta) needs to encode the taxonomy corresponding to each sequence in the fasta header lines in the following fashion (note, the second sequence is not assigned down to level 6):
+The sequence table is a matrix with rows corresponding to (and named by) the samples, and columns corresponding to (and named by) the sequence variants.
 
-::
+Usage
+.....
 
-    >Level1;Level2;Level3;Level4;Level5;Level6;
-    ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC
-    >Level1;Level2;Level3;Level4;Level5;
-    CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
+**Input**: The result of derepFastq, dada, or mergePairs.
 
-The list of required taxonomic ranks could be for instance: "Kingdom,Phylum,Class,Order,Family,Genus"
+**Output**: A data set of type dada2_sequencetable, i.e. a tabular with a row for each sample, and a column for each unique sequence across all the samples. The columns are named by the sequence.
 
-The reference data base for ** species assignment ** is a fasta file (or compressed fasta file), with the id line formatted as follows:
-
-::
+Details
+.......
 
-    >ID Genus species
-    ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC
-    >ID Genus species
-    CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
+Sequences that are much longer or shorter than expected may be the result of non-specific priming. You can remove non-target-length by applying a length filter. This is analogous to “cutting a band” in-silico to get amplicons of the targeted length.
+
+@HELP_OVERVIEW@
     ]]></help>
     <expand macro="citations"/>
 </tool>
author	matthias
date	Mon, 29 Apr 2019 09:00:48 -0400
parents	d2e7c5f8a9f7
children	ec4a183cc713