annotate README.rst @ 21:69d5400f3186 default tip

update readme
author Daniel Blankenberg <dan@bx.psu.edu>
date Tue, 27 Aug 2013 14:54:49 -0400
parents 66a4325d9394
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
14
0453014a9f1d update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 13
diff changeset
1 This repository contains the **Naive Variant Caller** tool.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
2
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
3 ------
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
4
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
5 **What it does**
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
6
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples.
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
8
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately.
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
10
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified.
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
12
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
14 ------
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
15
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
16 **Inputs**
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
17
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history.
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
19
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
20
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
21 **Outputs**
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
22
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
23 The output is in VCF format.
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
24
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
25 Example VCF output line, without reporting by strand:
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
26 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,``
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
27
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
28 Example VCF output line, when reporting by strand:
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
29 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,``
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
30
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
31 **Options**
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
32
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
33 Reference Genome:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
34
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
35 Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
36
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
37 Restrict to regions:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
38
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
39 You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
40
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
41 Minimum number of reads needed to consider a REF/ALT:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
42
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
43 This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
44
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
45 Minimum base quality:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
46
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
47 The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
48
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
49 Minimum mapping quality:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
50
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
51 The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
52
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
53 Ploidy:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
54
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
55 The number of genotype calls to make at each reported position.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
56
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
57 Only write out positions with with possible alternate alleles:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
58
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
59 When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
60
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
61 Report counts by strand:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
62
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
63 When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>.
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
64
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
65 Choose the dtype to use for storing coverage information:
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
66
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
67 This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits.
19
952a977a0f2f update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 18
diff changeset
68
21
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
69 +--------+----------------------------+
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
70 | name | maximum coverage value |
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
71 +========+============================+
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
72 | uint8 | 255 |
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
73 +--------+----------------------------+
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
74 | uint16 | 65,535 |
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
75 +--------+----------------------------+
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
76 | uint32 | 4,294,967,295 |
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
77 +--------+----------------------------+
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
78 | uint64 | 18,446,744,073,709,551,615 |
69d5400f3186 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 20
diff changeset
79 +--------+----------------------------+
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
80
19
952a977a0f2f update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 18
diff changeset
81
13
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
82 ------
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
83
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
84 **Citation**
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
85
162baa0e4db4 update readme
Daniel Blankenberg <dan@bx.psu.edu>
parents: 0
diff changeset
86 If you use this tool, please cite Blankenberg D, et al. *In preparation.*