annotate infer_experiment.xml @ 1:dc3b3b88fbab

first commit
author nilesh
date Thu, 18 Jul 2013 11:27:43 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
1 <tool id="infer_experiment" name="Infer Experiment">
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
2 <description>speculates how RNA-seq were configured</description>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
3 <requirements>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
4 <requirement type="package" version="2.3.7">rseqc</requirement>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
5 </requirements>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
6 <command interpreter="python"> infer_experiment.py -i $input -r $refgene
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
7
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
8 #if $sample_size.boolean
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
9 -s $sample_size.size
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
10 #end if
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
11
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
12 > $output
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
13 </command>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
14 <inputs>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
15 <param name="input" type="data" format="bam,sam" label="Input BAM/SAM file" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
16 <param name="refgene" type="data" format="bed" label="Reference gene model in bed format" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
17 <conditional name="sample_size">
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
18 <param name="boolean" type="boolean" label="Modify usable sampled reads" value="false" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
19 <when value="true">
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
20 <param name="size" type="integer" label="Number of usable sampled reads (default = 200000)" value="200000" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
21 </when>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
22 </conditional>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
23 </inputs>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
24 <outputs>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
25 <data format="txt" name="output" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
26 </outputs>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
27 <tests>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
28 <test>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
29 <param name="input" value="Pairend_nonStrandSpecific_36mer_Human_hg19.bam" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
30 <param name="refgene" value="hg19_RefSeq.bed" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
31 <output name="output" file="inferexpout.txt" />
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
32 </test>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
33 </tests>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
34 <help>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
35 .. image:: https://code.google.com/p/rseqc/logo?cct=1336721062
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
36
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
37 -----
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
38
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
39 About RSeQC
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
40 +++++++++++
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
41
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
42 The RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
43
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
44 The RSeQC package is licensed under the GNU GPL v3 license.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
45
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
46 Inputs
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
47 ++++++++++++++
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
48
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
49 Input BAM/SAM file
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
50 Alignment file in BAM/SAM format.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
51
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
52 Reference gene model
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
53 Gene model in BED format.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
54
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
55 Number of usable sampled reads (default=200000)
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
56 Number of usable reads sampled from SAM/BAM file. More reads will give more accurate estimation, but make program little slower.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
57
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
58
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
59 Output
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
60 ++++++++++++++
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
61 This program is used to speculate how RNA-seq sequencing were configured, especially how reads were stranded for strand-specific RNA-seq data, through comparing reads' mapping information to the underneath gene model. Generally, strand specific RNA-seq data should be handled differently in both visualization and RPKM calculation.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
62
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
63 For pair-end RNA-seq, there are two different ways to strand reads:
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
64
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
65 1) 1++,1--,2+-,2-+
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
66 - read1 mapped to '+' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
67 - read1 mapped to '-' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
68 - read2 mapped to '+' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
69 - read2 mapped to '-' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
70 2) 1+-,1-+,2++,2--
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
71 - read1 mapped to '+' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
72 - read1 mapped to '-' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
73 - read2 mapped to '+' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
74 - read2 mapped to '-' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
75
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
76 For single-end RNA-seq, there are also two different ways to strand reads:
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
77
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
78 1) ++,--
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
79 -read mapped to '+' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
80 - read mapped to '-' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
81 2) +-,-+
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
82 - read mapped to '+' strand indicates parental gene on '-' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
83 - read mapped to '-' strand indicates parental gene on '+' strand
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
84
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
85 Example Output
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
86 ++++++++++++++
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
87
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
88 **Example1** ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
89
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
90 =========================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
91 This is PairEnd Data ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
92
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
93 Fraction of reads explained by "1++,1--,2+-,2-+": 0.4992
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
94 Fraction of reads explained by "1+-,1-+,2++,2--": 0.5008
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
95 Fraction of reads explained by other combinations: 0.0000
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
96 =========================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
97
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
98 *Conclusion*: We can infer that this is NOT a strand specific because 50% of reads can be explained by "1++,1--,2+-,2-+", while the other 50% can be explained by "1+-,1-+,2++,2--".
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
99
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
100 **Example2** ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
101
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
102 ============================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
103 This is PairEnd Data
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
104
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
105 Fraction of reads explained by "1++,1--,2+-,2-+": 0.9644 ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
106 Fraction of reads explained by "1+-,1-+,2++,2--": 0.0356
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
107 Fraction of reads explained by other combinations: 0.0000
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
108 ============================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
109
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
110 *Conclusion*: We can infer that this is a strand-specific RNA-seq data. strandness of read1 is consistent with that of gene model, while strandness of read2 is opposite to the strand of reference gene model.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
111
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
112 **Example3** ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
113
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
114 =========================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
115 This is SingleEnd Data ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
116
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
117 Fraction of reads explained by "++,--": 0.9840 ::
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
118 Fraction of reads explained by "+-,-+": 0.0160
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
119 Fraction of reads explained by other combinations: 0.0000
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
120 =========================================================
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
121
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
122 *Conclusion*: This is single-end, strand specific RNA-seq data. Strandness of reads are concordant with strandness of reference gene.
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
123 </help>
dc3b3b88fbab first commit
nilesh
parents:
diff changeset
124 </tool>