comparison infer_experiment.xml @ 1:dc3b3b88fbab

first commit
author nilesh
date Thu, 18 Jul 2013 11:27:43 -0500
parents
children
comparison
equal deleted inserted replaced
0:0d133c7c387e 1:dc3b3b88fbab
1 <tool id="infer_experiment" name="Infer Experiment">
2 <description>speculates how RNA-seq were configured</description>
3 <requirements>
4 <requirement type="package" version="2.3.7">rseqc</requirement>
5 </requirements>
6 <command interpreter="python"> infer_experiment.py -i $input -r $refgene
7
8 #if $sample_size.boolean
9 -s $sample_size.size
10 #end if
11
12 > $output
13 </command>
14 <inputs>
15 <param name="input" type="data" format="bam,sam" label="Input BAM/SAM file" />
16 <param name="refgene" type="data" format="bed" label="Reference gene model in bed format" />
17 <conditional name="sample_size">
18 <param name="boolean" type="boolean" label="Modify usable sampled reads" value="false" />
19 <when value="true">
20 <param name="size" type="integer" label="Number of usable sampled reads (default = 200000)" value="200000" />
21 </when>
22 </conditional>
23 </inputs>
24 <outputs>
25 <data format="txt" name="output" />
26 </outputs>
27 <tests>
28 <test>
29 <param name="input" value="Pairend_nonStrandSpecific_36mer_Human_hg19.bam" />
30 <param name="refgene" value="hg19_RefSeq.bed" />
31 <output name="output" file="inferexpout.txt" />
32 </test>
33 </tests>
34 <help>
35 .. image:: https://code.google.com/p/rseqc/logo?cct=1336721062
36
37 -----
38
39 About RSeQC
40 +++++++++++
41
42 The RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
43
44 The RSeQC package is licensed under the GNU GPL v3 license.
45
46 Inputs
47 ++++++++++++++
48
49 Input BAM/SAM file
50 Alignment file in BAM/SAM format.
51
52 Reference gene model
53 Gene model in BED format.
54
55 Number of usable sampled reads (default=200000)
56 Number of usable reads sampled from SAM/BAM file. More reads will give more accurate estimation, but make program little slower.
57
58
59 Output
60 ++++++++++++++
61 This program is used to speculate how RNA-seq sequencing were configured, especially how reads were stranded for strand-specific RNA-seq data, through comparing reads' mapping information to the underneath gene model. Generally, strand specific RNA-seq data should be handled differently in both visualization and RPKM calculation.
62
63 For pair-end RNA-seq, there are two different ways to strand reads:
64
65 1) 1++,1--,2+-,2-+
66 - read1 mapped to '+' strand indicates parental gene on '+' strand
67 - read1 mapped to '-' strand indicates parental gene on '-' strand
68 - read2 mapped to '+' strand indicates parental gene on '-' strand
69 - read2 mapped to '-' strand indicates parental gene on '+' strand
70 2) 1+-,1-+,2++,2--
71 - read1 mapped to '+' strand indicates parental gene on '-' strand
72 - read1 mapped to '-' strand indicates parental gene on '+' strand
73 - read2 mapped to '+' strand indicates parental gene on '+' strand
74 - read2 mapped to '-' strand indicates parental gene on '-' strand
75
76 For single-end RNA-seq, there are also two different ways to strand reads:
77
78 1) ++,--
79 -read mapped to '+' strand indicates parental gene on '+' strand
80 - read mapped to '-' strand indicates parental gene on '-' strand
81 2) +-,-+
82 - read mapped to '+' strand indicates parental gene on '-' strand
83 - read mapped to '-' strand indicates parental gene on '+' strand
84
85 Example Output
86 ++++++++++++++
87
88 **Example1** ::
89
90 =========================================================
91 This is PairEnd Data ::
92
93 Fraction of reads explained by "1++,1--,2+-,2-+": 0.4992
94 Fraction of reads explained by "1+-,1-+,2++,2--": 0.5008
95 Fraction of reads explained by other combinations: 0.0000
96 =========================================================
97
98 *Conclusion*: We can infer that this is NOT a strand specific because 50% of reads can be explained by "1++,1--,2+-,2-+", while the other 50% can be explained by "1+-,1-+,2++,2--".
99
100 **Example2** ::
101
102 ============================================================
103 This is PairEnd Data
104
105 Fraction of reads explained by "1++,1--,2+-,2-+": 0.9644 ::
106 Fraction of reads explained by "1+-,1-+,2++,2--": 0.0356
107 Fraction of reads explained by other combinations: 0.0000
108 ============================================================
109
110 *Conclusion*: We can infer that this is a strand-specific RNA-seq data. strandness of read1 is consistent with that of gene model, while strandness of read2 is opposite to the strand of reference gene model.
111
112 **Example3** ::
113
114 =========================================================
115 This is SingleEnd Data ::
116
117 Fraction of reads explained by "++,--": 0.9840 ::
118 Fraction of reads explained by "+-,-+": 0.0160
119 Fraction of reads explained by other combinations: 0.0000
120 =========================================================
121
122 *Conclusion*: This is single-end, strand specific RNA-seq data. Strandness of reads are concordant with strandness of reference gene.
123 </help>
124 </tool>