Mercurial > repos > blankenberg > naive_variant_detector

--- a/tools/naive_variant_detector.xml	Mon Aug 26 12:45:02 2013 -0400
+++ b/tools/naive_variant_detector.xml	Mon Aug 26 17:47:25 2013 -0400
@@ -1,5 +1,5 @@
 <tool id="naive_variant_detector" name="Naive Variant Caller" version="0.0.1">
-  <description>on BAM files</description>
+  <description>tabulate variable sites from BAM datasets</description>
   <requirements>
     <requirement type="package" version="1.7.1">numpy</requirement>
     <requirement type="package" version="0.0.1">pyBamParser</requirement>
@@ -94,8 +94,8 @@
     <param name="use_strand" type="boolean" truevalue="--use_strand" falsevalue="" checked="False" label="Report counts by strand"/>

     <param name="coverage_dtype" type="select" label="Choose the dtype to use for storing coverage information" help="This affects the maximum recorded value for a position, e.g. uint8 would be 255 coverage, but will require the least amount of RAM">
-      <option value="uint8" selected="True">uint8</option>
-      <option value="uint16">uint16</option>
+      <option value="uint8">uint8</option>
+      <option value="uint16" selected="True">uint16</option>
       <option value="uint32">uint32</option>
       <option value="uint64">uint64</option>
     </param>
@@ -107,19 +107,74 @@
   <help>
 **What it does**

-This tool is a naive variant detector.
+This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples.
+
+User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately.
+
+In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of &lt;nucleotide&gt;=&lt;count&gt;, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified.
+

 ------

 **Inputs**

-Accepts one or more BAM input files.
+Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history.


 **Outputs**

 The output is in VCF format.

+**Options**
+
+Reference Genome:
+
+  Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history.
+
+Restrict to regions:
+
+ You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions.
+
+Minimum number of reads needed to consider a REF/ALT:
+
+ This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0.
+
+Minimum base quality:
+
+ The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter.
+
+Minimum mapping quality:
+
+ The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter.
+
+Ploidy:
+
+ The number of genotype calls to make at each reported position.
+
+Only write out positions with with possible alternate alleles:
+
+ When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output.
+
+Report counts by strand:
+
+ When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: &lt;strand&gt;&lt;BASE&gt;=&lt;COUNT&gt;.
+
+Choose the dtype to use for storing coverage information:
+
+ This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits.
+
+ +--------+----------------------+
+ | name   | max value            |
+ +========+======================+
+ | uint8  | 255                  |
+ +--------+----------------------+
+ | uint16 | 65535                |
+ +--------+----------------------+
+ | uint32 | 4294967295           |
+ +--------+----------------------+
+ | uint64 | 18446744073709551615 |
+ +--------+----------------------+
+
 ------

 **Citation**