# HG changeset patch # User Daniel Blankenberg # Date 1377553645 14400 # Node ID d6d7aa386bad345e8e868fe0f347fe9c8c8227fd # Parent a17fbdd7b47a2f3b0a4798c41f4566d0a0bf8a42 Update help text. diff -r a17fbdd7b47a -r d6d7aa386bad tools/naive_variant_detector.xml --- a/tools/naive_variant_detector.xml Mon Aug 26 12:45:02 2013 -0400 +++ b/tools/naive_variant_detector.xml Mon Aug 26 17:47:25 2013 -0400 @@ -1,5 +1,5 @@ - on BAM files + tabulate variable sites from BAM datasets numpy pyBamParser @@ -94,8 +94,8 @@ - - + + @@ -107,19 +107,74 @@ **What it does** -This tool is a naive variant detector. +This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples. + +User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately. + +In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified. + ------ **Inputs** -Accepts one or more BAM input files. +Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history. **Outputs** The output is in VCF format. +**Options** + +Reference Genome: + + Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history. + +Restrict to regions: + + You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions. + +Minimum number of reads needed to consider a REF/ALT: + + This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0. + +Minimum base quality: + + The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter. + +Minimum mapping quality: + + The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter. + +Ploidy: + + The number of genotype calls to make at each reported position. + +Only write out positions with with possible alternate alleles: + + When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output. + +Report counts by strand: + + When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>. + +Choose the dtype to use for storing coverage information: + + This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits. + + +--------+----------------------+ + | name | max value | + +========+======================+ + | uint8 | 255 | + +--------+----------------------+ + | uint16 | 65535 | + +--------+----------------------+ + | uint32 | 4294967295 | + +--------+----------------------+ + | uint64 | 18446744073709551615 | + +--------+----------------------+ + ------ **Citation**