Mercurial > repos > boris > getalleleseq

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/getalleleseq.xml	Tue Jun 25 00:45:38 2013 -0400
@@ -0,0 +1,109 @@
+<tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
+  <description>Generate major and minor allele sequences from alleles table</description>
+  <command interpreter="python">getalleleseq.py
+                                   $alleles
+                                -l $seq_length
+                                -j $major_seq
+                                -d $__new_file_path__
+                                -p $major_seq.id
+</command>
+  <inputs>
+    <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow *Count alleles* tool output format"/>
+    <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
+  </inputs>
+  <outputs>
+    <data format="fasta" name="major_seq"/>
+  </outputs>
+  <tests>
+    <test>
+      <param name="alleles" value="test-table-getalleleseq.tab"/>
+      <param name="seq_length" value="16569"/>
+      <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
+    </test>
+  </tests>
+
+  <help>
+
+
+The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
+Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.
+
+-----
+
+.. class:: infomark
+
+**What it does**
+
+It takes the table generated from the Count alleles tool to derive a major and minor allele sequence per sample.
+Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
+to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
+In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.
+
+-----
+
+.. class:: warningmark
+
+**Note**
+
+Please, follow the format described below for the input file:
+
+-----
+
+.. class:: infomark
+
+**Formats**
+
+**Count alleles tool output format**
+
+Columns::
+
+    1.  sample id
+    2.  chromosome
+    3.  position
+    4   counts for A's
+    5.  counts for C's
+    6.  counts for G's
+    7.  counts for T's
+    8.  Coverage
+    9.  Number of alleles passing frequency threshold
+    10. Major allele
+    11. Minor allele
+    12. Minor allele frequency in position
+
+
+**FASTA multiple alignment**
+
+See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format
+
+-----
+
+**Example**
+
+- For the following dataset::
+
+    S9	chrM	3	3	0	2	214	219	0	T	A	0.013698630137
+    S9	chrM	4	3	249	3	0	255	0	C	N	0.0
+    S9	chrM	5	245	1	1	0	247	1	A	N	0.0
+    S11	chrM	6	0	292	0	0	292	1	C	.	0.0
+    S7	chrM	6	0	254	0	0	254	1	C	.	0.0
+    S9	chrM	6	2	306	2	0	310	0	C	N	0.0
+    S11	chrM	7	281	0	3	0	284	0	A	G	0.0105633802817
+    S7	chrM	7	249	0	2	0	251	1	A	G	0.00796812749004
+    etc. for all covered positions per sample...
+
+- Running this tool with background sequence length 16569 will produce 4 files::
+
+    1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
+    2. minor allele sequence of sample S7
+    3. minor allele sequence of sample S9
+    4. minor allele sequence of sample S11
+
+-----
+
+**Citation**
+
+If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al. *In preparation.*
+(boris-at-bx.psu.edu)
+
+  </help>
+</tool>
\ No newline at end of file