Mercurial > repos > yhoogstrate > edger_with_design_matrix
annotate edgeR_Differential_Gene_Expression.xml @ 93:31335aa52b2e draft
Uploaded
author | yhoogstrate |
---|---|
date | Wed, 18 Mar 2015 06:40:01 -0400 |
parents | c81da57fff20 |
children | 46745f5666ac |
rev | line source |
---|---|
25 | 1 <?xml version="1.0" encoding="UTF-8"?> |
91 | 2 <tool id="edger_dge" name="edgeR: Differential Gene(Expression) Analysis" version="3.0.3-latest.d"> |
25 | 3 <description>RNA-Seq gene expression analysis using edgeR (R package)</description> |
4 | |
5 <requirements> | |
67 | 6 <requirement type="package" version="3.0.3">R</requirement> |
77 | 7 <requirement type="package" version="latest">biocLite_edgeR_limma</requirement> |
25 | 8 </requirements> |
9 | |
79 | 10 <version_command>R --vanilla --slave -e "library(edgeR) ; cat(sessionInfo()\$otherPkgs\$edgeR\$Version)" 2> /dev/null</version_command> |
11 | |
25 | 12 <command> |
13 <!-- | |
14 The following script is written in the "Cheetah" language: | |
15 http://www.cheetahtemplate.org/docs/users_guide_html_multipage/contents.html | |
16 --> | |
17 | |
18 R --vanilla --slave -f $R_script '--args | |
19 $expression_matrix | |
20 $design_matrix | |
21 $contrast | |
22 | |
23 $fdr | |
24 | |
25 $output_count_edgeR | |
26 $output_cpm | |
27 | |
28 /dev/null <!-- Calculation of FPKM/RPKM should come here --> | |
29 | |
30 #if $output_raw_counts: | |
31 $output_raw_counts | |
32 #else: | |
33 /dev/null | |
34 #end if | |
35 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
36 #if $output_MDSplot_logFC: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
37 $output_MDSplot_logFC |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
38 #else: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
39 /dev/null |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
40 #end if |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
41 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
42 #if $output_MDSplot_bcv: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
43 $output_MDSplot_bcv |
25 | 44 #else: |
45 /dev/null | |
46 #end if | |
47 | |
48 #if $output_BCVplot: | |
49 $output_BCVplot | |
50 #else: | |
51 /dev/null | |
52 #end if | |
53 | |
54 #if $output_MAplot: | |
55 $output_MAplot | |
56 #else: | |
57 /dev/null | |
58 #end if | |
59 | |
60 #if $output_PValue_distribution_plot: | |
61 $output_PValue_distribution_plot | |
62 #else: | |
63 /dev/null | |
64 #end if | |
65 | |
66 #if $output_hierarchical_clustering_plot: | |
67 $output_hierarchical_clustering_plot | |
68 #else: | |
69 /dev/null | |
70 #end if | |
71 | |
72 #if $output_heatmap_plot: | |
73 $output_heatmap_plot | |
74 #else: | |
75 /dev/null | |
76 #end if | |
77 | |
78 #if $output_RData_obj: | |
79 $output_RData_obj | |
80 #else: | |
81 /dev/null | |
82 #end if | |
55 | 83 |
84 $output_format_images | |
85 ' | |
25 | 86 #if $output_R: |
87 > $output_R | |
88 #else: | |
89 > /dev/null | |
90 #end if | |
91 | |
53 | 92 2> stderr.txt ; |
93 | |
94 grep -v 'Calculating library sizes from column' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
95 | |
96 ## Locale error messages: | |
97 grep -v 'During startup - Warning messages' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
98 grep -v 'Setting LC_TIME failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
99 grep -v 'Setting LC_MONETARY failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
100 grep -v 'Setting LC_PAPER failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
101 grep -v 'Setting LC_MEASUREMENT failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
102 grep -v 'Setting LC_CTYPE failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
103 grep -v 'Setting LC_COLLATE failed' stderr.txt > stderr2.txt ; rm stderr.txt ; mv stderr2.txt stderr.txt ; | |
104 | |
105 cat stderr.txt >&2 | |
25 | 106 </command> |
107 | |
108 <inputs> | |
109 <param name="expression_matrix" type="data" format="tabular" label="Expression (read count) matrix" /> | |
110 <param name="design_matrix" type="data" format="tabular" label="Design matrix" hepl="Ensure your samplenames are identical to those in the expression matrix. Preferentially, create the contrast matrix using 'edgeR: Design- from Expression matrix'." /> | |
111 | |
112 <param name="contrast" type="text" label="Contrast (biological question)" help="e.g. 'tumor-normal' or '(G1+G2)/2-G3' using the factors chosen in the design matrix. Read the 'makeContrasts' manual from Limma package for more info: http://www.bioconductor.org/packages/release/bioc/html/limma.html and http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf." /> | |
113 | |
114 <param name="fdr" type="float" min="0" max="1" value="0.05" label="False Discovery Rate (FDR)" /> | |
115 | |
116 <param name="outputs" type="select" label="Optional desired outputs" multiple="true" display="checkboxes"> | |
117 <option value="make_output_raw_counts">Raw counts table</option> | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
118 <option value="make_output_MDSplot_logFC">MDS-plot (logFC-method)</option> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
119 <option value="make_output_MDSplot_bcv">MDS-plot (BCV-method; much slower)</option> |
25 | 120 <option value="make_output_BCVplot">BCV-plot</option> |
121 <option value="make_output_MAplot">MA-plot</option> | |
122 <option value="make_output_PValue_distribution_plot">P-Value distribution plot</option> | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
123 <option value="make_output_hierarchical_clustering_plot">Hierarchical custering (under contstruction)</option> |
25 | 124 <option value="make_output_heatmap_plot">Heatmap</option> |
125 | |
43 | 126 <option value="make_output_R_stdout">R stdout</option> |
25 | 127 <option value="make_output_RData_obj">R Data object</option> |
128 </param> | |
55 | 129 |
130 <param name="output_format_images" type="select" label="Output format of images" display="radio"> | |
131 <option value="png">Portable network graphics (.png)</option> | |
132 <option value="pdf">Portable document format (.pdf)</option> | |
133 <option value="svg">Scalable vector graphics (.svg)</option> | |
134 </param> | |
25 | 135 </inputs> |
136 | |
137 <configfiles> | |
138 <configfile name="R_script"> | |
139 library(limma,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
140 library(edgeR,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
141 library(splines,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
142 | |
143 ## Fetch commandline arguments | |
144 args <- commandArgs(trailingOnly = TRUE) | |
145 | |
146 expression_matrix_file = args[1] | |
147 design_matrix_file = args[2] | |
148 contrast = args[3] | |
149 | |
150 fdr = args[4] | |
151 | |
152 output_count_edgeR = args[5] | |
153 output_cpm = args[6] | |
154 | |
43 | 155 output_xpkm = args[7] ##FPKM file - yet to be implemented |
25 | 156 |
157 output_raw_counts = args[8] | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
158 output_MDSplot_logFC = args[9] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
159 output_MDSplot_bcv = args[10] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
160 output_BCVplot = args[11] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
161 output_MAplot = args[12] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
162 output_PValue_distribution_plot = args[13] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
163 output_hierarchical_clustering_plot = args[14] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
164 output_heatmap_plot = args[15] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
165 output_RData_obj = args[16] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
166 output_format_images = args[17] |
25 | 167 |
168 | |
169 library(edgeR) | |
170 ##raw_data <- read.delim(designmatrix,header=T,stringsAsFactors=T) | |
171 ## Obtain read-counts | |
172 | |
173 expression_matrix <- read.delim(expression_matrix_file,header=T,stringsAsFactors=F,row.names=1,check.names=FALSE,na.strings=c("")) | |
174 design_matrix <- read.delim(design_matrix_file,header=T,stringsAsFactors=F,row.names=1,check.names=FALSE,na.strings=c("")) | |
175 | |
176 colnames(design_matrix) <- make.names(colnames(design_matrix)) | |
177 | |
178 for(i in 1:ncol(design_matrix)) { | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
179 old <- design_matrix[,i] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
180 design_matrix[,i] <- make.names(design_matrix[,i]) |
25 | 181 if(paste(design_matrix[,i],collapse="\t") != paste(old,collapse="\t")) { |
182 print("Renaming of factors:") | |
183 print(old) | |
184 print("To:") | |
185 print(design_matrix[,i]) | |
186 } | |
45 | 187 ## The following line seems to malfunction the script: |
188 ##design_matrix[,i] <- as.factor(design_matrix[,i]) | |
25 | 189 } |
190 | |
44 | 191 ## 1) In the expression matrix, you only want to have the samples described in the design matrix |
25 | 192 columns <- match(rownames(design_matrix),colnames(expression_matrix)) |
43 | 193 columns <- columns[!is.na(columns)] |
25 | 194 read_counts <- expression_matrix[,columns] |
195 | |
44 | 196 ## 2) In the design matrix, you only want to have samples of which you really have the counts |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
197 columns <- match(colnames(read_counts),rownames(design_matrix)) |
44 | 198 columns <- columns[!is.na(columns)] |
199 design_matrix <- design_matrix[columns,,drop=FALSE] | |
25 | 200 |
201 ## Filter for HTSeq predifined counts: | |
202 exclude_HTSeq <- c("no_feature","ambiguous","too_low_aQual","not_aligned","alignment_not_unique") | |
203 exclude_DEXSeq <- c("_ambiguous","_empty","_lowaqual","_notaligned") | |
204 | |
44 | 205 exclude <- match(c(exclude_HTSeq, exclude_DEXSeq),rownames(read_counts)) |
206 exclude <- exclude[is.na(exclude)==0] | |
25 | 207 if(length(exclude) != 0) { |
44 | 208 read_counts <- read_counts[-exclude,] |
25 | 209 } |
210 | |
211 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
212 ## sorting expression matrix with the order of the read_counts |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
213 ##order <- match(colnames(read_counts) , rownames(design_matrix)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
214 ##read_counts_ordered <- read_counts[,order2] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
215 |
44 | 216 empty_samples <- apply(read_counts,2,function(x) sum(x) == 0) |
25 | 217 if(sum(empty_samples) > 0) { |
218 write(paste("There are ",sum(empty_samples)," empty samples found:",sep=""),stderr()) | |
219 write(colnames(read_counts)[empty_samples],stderr()) | |
220 } else { | |
221 | |
222 dge <- DGEList(counts=read_counts,genes=rownames(read_counts)) | |
223 | |
224 formula <- paste(c("~0",make.names(colnames(design_matrix))),collapse = " + ") | |
225 design_matrix_tmp <- design_matrix | |
226 colnames(design_matrix_tmp) <- make.names(colnames(design_matrix_tmp)) | |
227 design <- model.matrix(as.formula(formula),design_matrix_tmp) | |
228 rm(design_matrix_tmp) | |
229 | |
230 # Filter prefixes | |
231 prefixes = colnames(design_matrix)[attr(design,"assign")] | |
232 avoid = nchar(prefixes) == nchar(colnames(design)) | |
233 replacements = substr(colnames(design),nchar(prefixes)+1,nchar(colnames(design))) | |
234 replacements[avoid] = colnames(design)[avoid] | |
235 colnames(design) = replacements | |
236 | |
237 # Do normalization | |
238 write("Calculating normalization factors...",stdout()) | |
239 dge <- calcNormFactors(dge) | |
240 write("Estimating common dispersion...",stdout()) | |
241 dge <- estimateGLMCommonDisp(dge,design) | |
242 write("Estimating trended dispersion...",stdout()) | |
243 dge <- estimateGLMTrendedDisp(dge,design) | |
244 write("Estimating tagwise dispersion...",stdout()) | |
245 dge <- estimateGLMTagwiseDisp(dge,design) | |
246 | |
247 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
248 if(output_MDSplot_logFC != "/dev/null") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
249 write("Creating MDS plot (logFC method)",stdout()) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
250 points <- plotMDS.DGEList(dge,top=500,labels=rep("",nrow(dge\$samples)))# Get coordinates of unflexible plot |
25 | 251 dev.off()# Kill it |
252 | |
91 | 253 if(output_format_images == "pdf") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
254 pdf(output_MDSplot_logFC,height=14,width=14) |
55 | 255 } else if(output_format_images == "svg") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
256 svg(output_MDSplot_logFC,height=14,width=14) |
91 | 257 } else { |
258 ## png(output_MDSplot_logFC) | |
259 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
260 | |
261 bitmap(output_MDSplot_logFC,type="png16m",height=14,width=14) | |
70 | 262 } |
91 | 263 |
55 | 264 |
25 | 265 diff_x <- abs(max(points\$x)-min(points\$x)) |
266 diff_y <-(max(points\$y)-min(points\$y)) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
267 plot(c(min(points\$x),max(points\$x) + 0.45 * diff_x), c(min(points\$y) - 0.05 * diff_y,max(points\$y) + 0.05 * diff_y), main="edgeR logFC-MDS Plot on top 500 genes",type="n", xlab="Leading logFC dim 1", ylab="Leading logFC dim 2") |
25 | 268 points(points\$x,points\$y,pch=20) |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
269 text(points\$x, points\$y,rownames(dge\$samples),cex=1.25,col="gray",pos=4) |
25 | 270 rm(diff_x,diff_y) |
271 | |
272 dev.off() | |
273 } | |
274 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
275 if(output_MDSplot_bcv != "/dev/null") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
276 write("Creating MDS plot (bcv method)",stdout()) |
93 | 277 |
278 ## 1. First create a virtual plot to obtain the desired coordinates | |
279 pdf("bcvmds.pdf") | |
280 points <- plotMDS.DGEList(dge,method="bcv",top=500,labels=rep("",nrow(dge\$samples))) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
281 dev.off()# Kill it |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
282 |
93 | 283 ## 2. Re-plot the coordinates in a new figure with the size and settings. |
91 | 284 if(output_format_images == "pdf") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
285 pdf(output_MDSplot_bcv,height=14,width=14) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
286 } else if(output_format_images == "svg") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
287 svg(output_MDSplot_bcv,height=14,width=14) |
91 | 288 } else { |
289 ## png(output_MDSplot_bcv) | |
290 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
291 | |
292 bitmap(output_MDSplot_bcv,type="png16m",height=14,width=14) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
293 } |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
294 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
295 diff_x <- abs(max(points\$x)-min(points\$x)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
296 diff_y <-(max(points\$y)-min(points\$y)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
297 plot(c(min(points\$x),max(points\$x) + 0.45 * diff_x), c(min(points\$y) - 0.05 * diff_y,max(points\$y) + 0.05 * diff_y), main="edgeR BCV-MDS Plot",type="n", xlab="Leading BCV dim 1", ylab="Leading BCV dim 2") |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
298 points(points\$x,points\$y,pch=20) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
299 text(points\$x, points\$y,rownames(dge\$samples),cex=1.25,col="gray",pos=4) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
300 rm(diff_x,diff_y) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
301 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
302 dev.off() |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
303 } |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
304 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
305 |
25 | 306 if(output_BCVplot != "/dev/null") { |
307 write("Creating Biological coefficient of variation plot",stdout()) | |
60 | 308 |
91 | 309 if(output_format_images == "pdf") { |
60 | 310 pdf(output_BCVplot) |
311 } else if(output_format_images == "svg") { | |
312 svg(output_BCVplot) | |
91 | 313 } else { |
314 ## png(output_BCVplot) | |
315 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
316 | |
317 bitmap(output_BCVplot,type="png16m") | |
70 | 318 } |
60 | 319 |
25 | 320 plotBCV(dge, cex=0.4, main="edgeR: Biological coefficient of variation (BCV) vs abundance") |
321 dev.off() | |
322 } | |
323 | |
324 | |
325 write("Fitting GLM...",stdout()) | |
326 fit <- glmFit(dge,design) | |
327 | |
328 write(paste("Performing likelihood ratio test: ",contrast,sep=""),stdout()) | |
329 cont <- c(contrast) | |
330 cont <- makeContrasts(contrasts=cont, levels=design) | |
331 | |
332 lrt <- glmLRT(fit, contrast=cont[,1]) | |
333 write(paste("Exporting to file: ",output_count_edgeR,sep=""),stdout()) | |
334 write.table(file=output_count_edgeR,topTags(lrt,n=nrow(read_counts))\$table,sep="\t",row.names=TRUE,col.names=NA) | |
335 write.table(file=output_cpm,cpm(dge,normalized.lib.sizes=TRUE),sep="\t",row.names=TRUE,col.names=NA) | |
336 | |
337 ## todo EXPORT FPKM | |
338 write.table(file=output_raw_counts,dge\$counts,sep="\t",row.names=TRUE,col.names=NA) | |
339 | |
34 | 340 if(output_MAplot != "/dev/null" || output_PValue_distribution_plot != "/dev/null") { |
25 | 341 etable <- topTags(lrt, n=nrow(dge))\$table |
342 etable <- etable[order(etable\$FDR), ] | |
32 | 343 |
344 if(output_MAplot != "/dev/null") { | |
345 write("Creating MA plot...",stdout()) | |
60 | 346 |
91 | 347 if(output_format_images == "pdf") { |
60 | 348 pdf(output_MAplot) |
349 } else if(output_format_images == "svg") { | |
350 svg(output_MAplot) | |
91 | 351 } else { |
352 ## png(output_MAplot) | |
353 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
354 | |
355 bitmap(output_MAplot,type="png16m") | |
70 | 356 } |
60 | 357 |
32 | 358 with(etable, plot(logCPM, logFC, pch=20, main="edgeR: Fold change vs abundance")) |
359 with(subset(etable, FDR < fdr), points(logCPM, logFC, pch=20, col="red")) | |
360 abline(h=c(-1,1), col="blue") | |
361 dev.off() | |
362 } | |
25 | 363 |
32 | 364 if(output_PValue_distribution_plot != "/dev/null") { |
365 write("Creating P-value distribution plot...",stdout()) | |
60 | 366 |
91 | 367 if(output_format_images == "pdf") { |
368 pdf(output_PValue_distribution_plot,width=14,height=14) | |
60 | 369 } else if(output_format_images == "svg") { |
91 | 370 svg(output_PValue_distribution_plot,width=14,height=14) |
371 } else { | |
372 ## png(output_PValue_distribution_plot) | |
373 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
374 | |
375 bitmap(output_PValue_distribution_plot,type="png16m",width=14,height=14) | |
70 | 376 } |
60 | 377 |
32 | 378 expressed_genes <- subset(etable, PValue < 0.99) |
379 h <- hist(expressed_genes\$PValue,breaks=nrow(expressed_genes)/15,main="Binned P-Values (< 0.99)") | |
380 center <- sum(h\$counts) / length(h\$counts) | |
381 lines(c(0,1),c(center,center),lty=2,col="red",lwd=2) | |
382 k <- ksmooth(h\$mid, h\$counts) | |
383 lines(k\$x,k\$y,col="red",lwd=2) | |
384 rmsd <- (h\$counts) - center | |
385 rmsd <- rmsd^2 | |
386 rmsd <- sum(rmsd) | |
387 rmsd <- sqrt(rmsd) | |
388 text(0,max(h\$counts),paste("e=",round(rmsd,2),sep=""),pos=4,col="blue") | |
389 ## change e into epsilon somehow | |
390 dev.off() | |
391 } | |
40 | 392 } |
393 | |
394 if(output_heatmap_plot != "/dev/null") { | |
60 | 395 |
91 | 396 if(output_format_images == "pdf") { |
60 | 397 pdf(output_heatmap_plot,width=10.5) |
398 } else if(output_format_images == "svg") { | |
399 svg(output_heatmap_plot,width=10.5) | |
91 | 400 } else { |
401 ## png(output_heatmap_plot) | |
402 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
403 | |
404 bitmap(output_heatmap_plot,type="png16m",width=10.5) | |
70 | 405 } |
60 | 406 |
40 | 407 etable2 <- topTags(lrt, n=100)\$table |
408 order <- rownames(etable2) | |
409 cpm_sub <- cpm(dge,normalized.lib.sizes=TRUE,log=TRUE)[as.numeric(order),] | |
410 heatmap(t(cpm_sub)) | |
411 dev.off() | |
25 | 412 } |
413 | |
414 ##output_hierarchical_clustering_plot = args[13] | |
415 | |
35 | 416 if(output_RData_obj != "/dev/null") { |
25 | 417 save.image(output_RData_obj) |
418 } | |
419 | |
420 write("Done!",stdout()) | |
421 } | |
422 </configfile> | |
423 </configfiles> | |
424 | |
425 <outputs> | |
53 | 426 <data format="tabular" name="output_count_edgeR" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - differentially expressed genes" /> |
25 | 427 <data format="tabular" name="output_cpm" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - CPM" /> |
428 | |
429 <data format="tabular" name="output_raw_counts" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - raw counts"> | |
53 | 430 <filter>outputs and ("make_output_raw_counts" in outputs)</filter> |
25 | 431 </data> |
432 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
433 <data format="png" name="output_MDSplot_logFC" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MDS-plot (logFC method)"> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
434 <filter>outputs and ("make_output_MDSplot_logFC" in outputs)</filter> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
435 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
436 <change_format> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
437 <when input="output_format_images" value="png" format="png" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
438 <when input="output_format_images" value="pdf" format="pdf" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
439 <when input="output_format_images" value="svg" format="svg" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
440 </change_format> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
441 </data> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
442 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
443 <data format="png" name="output_MDSplot_bcv" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MDS-plot (bcv method)"> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
444 <filter>outputs and ("make_output_MDSplot_bcv" in outputs)</filter> |
59 | 445 |
446 <change_format> | |
447 <when input="output_format_images" value="png" format="png" /> | |
448 <when input="output_format_images" value="pdf" format="pdf" /> | |
449 <when input="output_format_images" value="svg" format="svg" /> | |
450 </change_format> | |
25 | 451 </data> |
452 | |
60 | 453 <data format="png" name="output_BCVplot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - BCV-plot"> |
53 | 454 <filter>outputs and ("make_output_BCVplot" in outputs)</filter> |
60 | 455 |
456 <change_format> | |
457 <when input="output_format_images" value="png" format="png" /> | |
458 <when input="output_format_images" value="pdf" format="pdf" /> | |
459 <when input="output_format_images" value="svg" format="svg" /> | |
460 </change_format> | |
25 | 461 </data> |
462 | |
60 | 463 <data format="png" name="output_MAplot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MA-plot"> |
53 | 464 <filter>outputs and ("make_output_MAplot" in outputs)</filter> |
60 | 465 |
466 <change_format> | |
467 <when input="output_format_images" value="png" format="png" /> | |
468 <when input="output_format_images" value="pdf" format="pdf" /> | |
469 <when input="output_format_images" value="svg" format="svg" /> | |
470 </change_format> | |
25 | 471 </data> |
472 | |
60 | 473 <data format="png" name="output_PValue_distribution_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - P-Value distribution"> |
53 | 474 <filter>outputs and ("make_output_PValue_distribution_plot" in outputs)</filter> |
60 | 475 |
476 <change_format> | |
477 <when input="output_format_images" value="png" format="png" /> | |
478 <when input="output_format_images" value="pdf" format="pdf" /> | |
479 <when input="output_format_images" value="svg" format="svg" /> | |
480 </change_format> | |
25 | 481 </data> |
482 | |
60 | 483 <data format="png" name="output_hierarchical_clustering_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - Hierarchical custering"> |
53 | 484 <filter>outputs and ("make_output_hierarchical_clustering_plot" in outputs)</filter> |
60 | 485 |
486 <change_format> | |
487 <when input="output_format_images" value="png" format="png" /> | |
488 <when input="output_format_images" value="pdf" format="pdf" /> | |
489 <when input="output_format_images" value="svg" format="svg" /> | |
490 </change_format> | |
25 | 491 </data> |
492 | |
60 | 493 <data format="png" name="output_heatmap_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - Heatmap"> |
53 | 494 <filter>outputs and ("make_output_heatmap_plot" in outputs)</filter> |
60 | 495 |
496 <change_format> | |
497 <when input="output_format_images" value="png" format="png" /> | |
498 <when input="output_format_images" value="pdf" format="pdf" /> | |
499 <when input="output_format_images" value="svg" format="svg" /> | |
500 </change_format> | |
25 | 501 </data> |
502 | |
503 <data format="RData" name="output_RData_obj" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - R data object"> | |
53 | 504 <filter>outputs and ("make_output_RData_obj" in outputs)</filter> |
25 | 505 </data> |
506 | |
40 | 507 <data format="txt" name="output_R" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - R output (debug)" > |
53 | 508 <filter>outputs and ("make_output_R_stdout" in outputs)</filter> |
25 | 509 </data> |
510 </outputs> | |
511 | |
512 <help> | |
513 edgeR: Differential Gene(Expression) Analysis | |
36 | 514 ############################################# |
25 | 515 |
36 | 516 Overview |
517 -------- | |
518 Differential expression analysis of RNA-seq and digital gene expression profiles with biological replication. Uses empirical Bayes estimation and exact tests based on the negative binomial distribution. Also useful for differential signal analysis with other types of genome-scale count data [1]. | |
25 | 519 |
520 For every experiment, the algorithm requires a design matrix. This matrix describes which samples belong to which groups. | |
36 | 521 More details on this are given in the edgeR manual: http://www.bioconductor.org/packages/2.12/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf |
25 | 522 and the limma manual. |
523 | |
524 Because the creation of a design matrix can be complex and time consuming, especially if no GUI is used, this package comes with an alternative tool which can help you with it. | |
525 This tool is called *edgeR Design Matrix Creator*. | |
526 If the appropriate design matrix (with corresponding links to the files) is given, | |
527 the correct contrast ( http://en.wikipedia.org/wiki/Contrast_(statistics) ) has to be given. | |
528 | |
529 If you have for example two groups, with an equal weight, you would like to compare either | |
79 | 530 "g1-g2" or "normal-cancer". |
25 | 531 |
36 | 532 The test function makes use of a MCF7 dataset used in a study that indicates that a higher sequencing depth is not neccesairily more important than a higher amount of replaciates[2]. |
25 | 533 |
36 | 534 Input |
535 ----- | |
536 Expression matrix | |
537 ^^^^^^^^^^^^^^^^^ | |
538 :: | |
25 | 539 |
540 Geneid "\t" Sample-1 "\t" Sample-2 "\t" Sample-3 "\t" Sample-4 [...] "\n" | |
541 SMURF "\t" 123 "\t" 21 "\t" 34545 "\t" 98 ... "\n" | |
542 BRCA1 "\t" 435 "\t" 6655 "\t" 45 "\t" 55 ... "\n" | |
543 LINK33 "\t" 4 "\t" 645 "\t" 345 "\t" 1 ... "\n" | |
544 SNORD78 "\t" 498 "\t" 65 "\t" 98 "\t" 27 ... "\n" | |
545 [...] | |
546 | |
36 | 547 *Note: Make sure the number of columns in the header is identical to the number of columns in the body.* |
25 | 548 |
36 | 549 Design matrix |
550 ^^^^^^^^^^^^^ | |
551 :: | |
25 | 552 |
553 Sample "\t" Condition "\t" Ethnicity "\t" Patient "\t" Batch "\n" | |
554 Sample-1 "\t" Tumor "\t" European "\t" 1 "\t" 1 "\n" | |
555 Sample-2 "\t" Normal "\t" European "\t" 1 "\t" 1 "\n" | |
556 Sample-3 "\t" Tumor "\t" European "\t" 2 "\t" 1 "\n" | |
557 Sample-4 "\t" Normal "\t" European "\t" 2 "\t" 1 "\n" | |
558 Sample-5 "\t" Tumor "\t" African "\t" 3 "\t" 1 "\n" | |
559 Sample-6 "\t" Normal "\t" African "\t" 3 "\t" 1 "\n" | |
560 Sample-7 "\t" Tumor "\t" African "\t" 4 "\t" 2 "\n" | |
561 Sample-8 "\t" Normal "\t" African "\t" 4 "\t" 2 "\n" | |
562 Sample-9 "\t" Tumor "\t" Asian "\t" 5 "\t" 2 "\n" | |
563 Sample-10 "\t" Normal "\t" Asian "\t" 5 "\t" 2 "\n" | |
564 Sample-11 "\t" Tumor "\t" Asian "\t" 6 "\t" 2 "\n" | |
565 Sample-12 "\t" Normal "\t" Asian "\t" 6 "\t" 2 "\n" | |
566 | |
36 | 567 *Note: Avoid factor names that are (1) numerical, (2) contain mathematical symbols and preferebly only use letters.* |
25 | 568 |
36 | 569 Contrast |
570 ^^^^^^^^ | |
571 The contrast represents the biological question. There can be many questions asked, e.g.: | |
25 | 572 |
36 | 573 - Tumor-Normal |
574 - African-European | |
575 - 0.5*(Control+Placebo) / Treated | |
25 | 576 |
36 | 577 Installation |
578 ------------ | |
25 | 579 |
580 This tool requires no specific configurations. The following dependencies are installed automatically: | |
36 | 581 |
582 - R | |
583 - Bioconductor | |
79 | 584 - limma |
585 - edgeR | |
25 | 586 |
36 | 587 License |
588 ------- | |
589 - R | |
79 | 590 - GPL 2 & GPL 3 |
36 | 591 - limma |
592 - GPL (>=2) | |
593 - edgeR | |
79 | 594 - GPL (>=2) |
36 | 595 |
596 References | |
597 ---------- | |
598 | |
599 EdgeR | |
600 ^^^^^ | |
601 **[1] edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.** | |
25 | 602 |
36 | 603 *Mark D. Robinson, Davis J. McCarthy and Gordon K. Smyth* - Bioinformatics (2010) 26 (1): 139-140. |
604 | |
605 - http://www.bioconductor.org/packages/2.12/bioc/html/edgeR.html | |
606 - http://dx.doi.org/10.1093/bioinformatics/btp616 | |
607 - http://www.bioconductor.org/packages/release/bioc/html/edgeR.html | |
25 | 608 |
36 | 609 Test-data (MCF7) |
610 ^^^^^^^^^^^^^^^^ | |
611 **[2] RNA-seq differential expression studies: more sequence or more replication?** | |
612 | |
613 *Yuwen Liu, Jie Zhou and Kevin P. White* - Bioinformatics (2014) 30 (3): 301-304. | |
614 | |
615 - http://www.ncbi.nlm.nih.gov/pubmed/24319002 | |
616 - http://dx.doi.org/10.1093/bioinformatics/btt688 | |
617 | |
618 Contact | |
619 ------- | |
79 | 620 |
621 The tool wrapper has been written by Youri Hoogstrate from the Erasmus | |
622 Medical Center (Rotterdam, Netherlands) on behalf of the Translational | |
623 Research IT (TraIT) project: | |
83 | 624 |
25 | 625 http://www.ctmm.nl/en/programmas/infrastructuren/traitprojecttranslationeleresearch |
626 | |
79 | 627 More tools by the Translational Research IT (TraIT) project can be found |
628 in the following toolsheds: | |
83 | 629 |
630 http://toolshed.dtls.nl/ | |
631 | |
632 http://toolshed.g2.bx.psu.edu | |
633 | |
634 http://testtoolshed.g2.bx.psu.edu/ | |
79 | 635 |
36 | 636 I would like to thank Hina Riaz - Naz Khan for her helpful contribution. |
25 | 637 </help> |
638 </tool> |