Mercurial > repos > yhoogstrate > edger_with_design_matrix
annotate edgeR_Differential_Gene_Expression.xml @ 95:9dac2146b98c draft
Uploaded
author | yhoogstrate |
---|---|
date | Sun, 29 Mar 2015 02:41:02 -0400 |
parents | 46745f5666ac |
children |
rev | line source |
---|---|
25 | 1 <?xml version="1.0" encoding="UTF-8"?> |
91 | 2 <tool id="edger_dge" name="edgeR: Differential Gene(Expression) Analysis" version="3.0.3-latest.d"> |
25 | 3 <description>RNA-Seq gene expression analysis using edgeR (R package)</description> |
4 | |
5 <requirements> | |
67 | 6 <requirement type="package" version="3.0.3">R</requirement> |
77 | 7 <requirement type="package" version="latest">biocLite_edgeR_limma</requirement> |
25 | 8 </requirements> |
9 | |
95 | 10 <version_command> |
11 echo $(R --version | grep version | grep -v GNU) " , EdgeR version" $(R --vanilla --slave -e "library(edgeR) ; cat(sessionInfo()\$otherPkgs\$edgeR\$Version)" 2> /dev/null | grep -v "WARNING: ")</version_command> | |
25 | 12 <command> |
13 <!-- | |
14 The following script is written in the "Cheetah" language: | |
15 http://www.cheetahtemplate.org/docs/users_guide_html_multipage/contents.html | |
16 --> | |
17 | |
18 R --vanilla --slave -f $R_script '--args | |
19 $expression_matrix | |
20 $design_matrix | |
21 $contrast | |
22 | |
23 $fdr | |
24 | |
25 $output_count_edgeR | |
26 $output_cpm | |
27 | |
28 /dev/null <!-- Calculation of FPKM/RPKM should come here --> | |
29 | |
30 #if $output_raw_counts: | |
31 $output_raw_counts | |
32 #else: | |
33 /dev/null | |
34 #end if | |
35 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
36 #if $output_MDSplot_logFC: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
37 $output_MDSplot_logFC |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
38 #else: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
39 /dev/null |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
40 #end if |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
41 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
42 #if $output_MDSplot_bcv: |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
43 $output_MDSplot_bcv |
25 | 44 #else: |
45 /dev/null | |
46 #end if | |
47 | |
48 #if $output_BCVplot: | |
49 $output_BCVplot | |
50 #else: | |
51 /dev/null | |
52 #end if | |
53 | |
54 #if $output_MAplot: | |
55 $output_MAplot | |
56 #else: | |
57 /dev/null | |
58 #end if | |
59 | |
60 #if $output_PValue_distribution_plot: | |
61 $output_PValue_distribution_plot | |
62 #else: | |
63 /dev/null | |
64 #end if | |
65 | |
66 #if $output_hierarchical_clustering_plot: | |
67 $output_hierarchical_clustering_plot | |
68 #else: | |
69 /dev/null | |
70 #end if | |
71 | |
72 #if $output_heatmap_plot: | |
73 $output_heatmap_plot | |
74 #else: | |
75 /dev/null | |
76 #end if | |
77 | |
78 #if $output_RData_obj: | |
79 $output_RData_obj | |
80 #else: | |
81 /dev/null | |
82 #end if | |
55 | 83 |
84 $output_format_images | |
85 ' | |
25 | 86 #if $output_R: |
87 > $output_R | |
88 #else: | |
89 > /dev/null | |
90 #end if | |
91 </command> | |
92 | |
94 | 93 <stdio> |
94 <regex match="Calculating library sizes from column" | |
95 source="stderr" | |
96 level="log" /> | |
97 <regex match="During startup - Warning messages" | |
98 source="stderr" | |
99 level="log" /> | |
100 <regex match="Setting LC_[^ ]+ failed" | |
101 source="stderr" | |
102 level="warning" | |
103 description="LOCALE has not been set correctly" /> | |
104 </stdio> | |
105 | |
25 | 106 <inputs> |
107 <param name="expression_matrix" type="data" format="tabular" label="Expression (read count) matrix" /> | |
94 | 108 <param name="design_matrix" type="data" format="tabular" label="Design matrix" help="Ensure your samplenames are identical to those in the expression matrix. Preferentially, create the contrast matrix using 'edgeR: Design- from Expression matrix'." /> |
25 | 109 |
110 <param name="contrast" type="text" label="Contrast (biological question)" help="e.g. 'tumor-normal' or '(G1+G2)/2-G3' using the factors chosen in the design matrix. Read the 'makeContrasts' manual from Limma package for more info: http://www.bioconductor.org/packages/release/bioc/html/limma.html and http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf." /> | |
111 | |
112 <param name="fdr" type="float" min="0" max="1" value="0.05" label="False Discovery Rate (FDR)" /> | |
113 | |
114 <param name="outputs" type="select" label="Optional desired outputs" multiple="true" display="checkboxes"> | |
115 <option value="make_output_raw_counts">Raw counts table</option> | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
116 <option value="make_output_MDSplot_logFC">MDS-plot (logFC-method)</option> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
117 <option value="make_output_MDSplot_bcv">MDS-plot (BCV-method; much slower)</option> |
25 | 118 <option value="make_output_BCVplot">BCV-plot</option> |
119 <option value="make_output_MAplot">MA-plot</option> | |
120 <option value="make_output_PValue_distribution_plot">P-Value distribution plot</option> | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
121 <option value="make_output_hierarchical_clustering_plot">Hierarchical custering (under contstruction)</option> |
25 | 122 <option value="make_output_heatmap_plot">Heatmap</option> |
123 | |
43 | 124 <option value="make_output_R_stdout">R stdout</option> |
25 | 125 <option value="make_output_RData_obj">R Data object</option> |
126 </param> | |
55 | 127 |
128 <param name="output_format_images" type="select" label="Output format of images" display="radio"> | |
129 <option value="png">Portable network graphics (.png)</option> | |
130 <option value="pdf">Portable document format (.pdf)</option> | |
131 <option value="svg">Scalable vector graphics (.svg)</option> | |
132 </param> | |
25 | 133 </inputs> |
134 | |
135 <configfiles> | |
136 <configfile name="R_script"> | |
137 library(limma,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
138 library(edgeR,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
139 library(splines,quietly=TRUE) ## enable quietly to avoid unnecessaity stderr dumping | |
140 | |
141 ## Fetch commandline arguments | |
142 args <- commandArgs(trailingOnly = TRUE) | |
143 | |
144 expression_matrix_file = args[1] | |
145 design_matrix_file = args[2] | |
146 contrast = args[3] | |
147 | |
148 fdr = args[4] | |
149 | |
150 output_count_edgeR = args[5] | |
151 output_cpm = args[6] | |
152 | |
43 | 153 output_xpkm = args[7] ##FPKM file - yet to be implemented |
25 | 154 |
155 output_raw_counts = args[8] | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
156 output_MDSplot_logFC = args[9] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
157 output_MDSplot_bcv = args[10] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
158 output_BCVplot = args[11] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
159 output_MAplot = args[12] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
160 output_PValue_distribution_plot = args[13] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
161 output_hierarchical_clustering_plot = args[14] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
162 output_heatmap_plot = args[15] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
163 output_RData_obj = args[16] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
164 output_format_images = args[17] |
25 | 165 |
166 | |
167 library(edgeR) | |
168 ##raw_data <- read.delim(designmatrix,header=T,stringsAsFactors=T) | |
169 ## Obtain read-counts | |
170 | |
171 expression_matrix <- read.delim(expression_matrix_file,header=T,stringsAsFactors=F,row.names=1,check.names=FALSE,na.strings=c("")) | |
172 design_matrix <- read.delim(design_matrix_file,header=T,stringsAsFactors=F,row.names=1,check.names=FALSE,na.strings=c("")) | |
173 | |
174 colnames(design_matrix) <- make.names(colnames(design_matrix)) | |
175 | |
176 for(i in 1:ncol(design_matrix)) { | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
177 old <- design_matrix[,i] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
178 design_matrix[,i] <- make.names(design_matrix[,i]) |
25 | 179 if(paste(design_matrix[,i],collapse="\t") != paste(old,collapse="\t")) { |
180 print("Renaming of factors:") | |
181 print(old) | |
182 print("To:") | |
183 print(design_matrix[,i]) | |
184 } | |
45 | 185 ## The following line seems to malfunction the script: |
186 ##design_matrix[,i] <- as.factor(design_matrix[,i]) | |
25 | 187 } |
188 | |
44 | 189 ## 1) In the expression matrix, you only want to have the samples described in the design matrix |
25 | 190 columns <- match(rownames(design_matrix),colnames(expression_matrix)) |
43 | 191 columns <- columns[!is.na(columns)] |
25 | 192 read_counts <- expression_matrix[,columns] |
193 | |
44 | 194 ## 2) In the design matrix, you only want to have samples of which you really have the counts |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
195 columns <- match(colnames(read_counts),rownames(design_matrix)) |
44 | 196 columns <- columns[!is.na(columns)] |
197 design_matrix <- design_matrix[columns,,drop=FALSE] | |
25 | 198 |
199 ## Filter for HTSeq predifined counts: | |
200 exclude_HTSeq <- c("no_feature","ambiguous","too_low_aQual","not_aligned","alignment_not_unique") | |
201 exclude_DEXSeq <- c("_ambiguous","_empty","_lowaqual","_notaligned") | |
202 | |
44 | 203 exclude <- match(c(exclude_HTSeq, exclude_DEXSeq),rownames(read_counts)) |
204 exclude <- exclude[is.na(exclude)==0] | |
25 | 205 if(length(exclude) != 0) { |
44 | 206 read_counts <- read_counts[-exclude,] |
25 | 207 } |
208 | |
209 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
210 ## sorting expression matrix with the order of the read_counts |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
211 ##order <- match(colnames(read_counts) , rownames(design_matrix)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
212 ##read_counts_ordered <- read_counts[,order2] |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
213 |
44 | 214 empty_samples <- apply(read_counts,2,function(x) sum(x) == 0) |
25 | 215 if(sum(empty_samples) > 0) { |
216 write(paste("There are ",sum(empty_samples)," empty samples found:",sep=""),stderr()) | |
217 write(colnames(read_counts)[empty_samples],stderr()) | |
218 } else { | |
219 | |
220 dge <- DGEList(counts=read_counts,genes=rownames(read_counts)) | |
221 | |
222 formula <- paste(c("~0",make.names(colnames(design_matrix))),collapse = " + ") | |
223 design_matrix_tmp <- design_matrix | |
224 colnames(design_matrix_tmp) <- make.names(colnames(design_matrix_tmp)) | |
225 design <- model.matrix(as.formula(formula),design_matrix_tmp) | |
226 rm(design_matrix_tmp) | |
227 | |
228 # Filter prefixes | |
229 prefixes = colnames(design_matrix)[attr(design,"assign")] | |
230 avoid = nchar(prefixes) == nchar(colnames(design)) | |
231 replacements = substr(colnames(design),nchar(prefixes)+1,nchar(colnames(design))) | |
232 replacements[avoid] = colnames(design)[avoid] | |
233 colnames(design) = replacements | |
234 | |
235 # Do normalization | |
236 write("Calculating normalization factors...",stdout()) | |
237 dge <- calcNormFactors(dge) | |
238 write("Estimating common dispersion...",stdout()) | |
239 dge <- estimateGLMCommonDisp(dge,design) | |
240 write("Estimating trended dispersion...",stdout()) | |
241 dge <- estimateGLMTrendedDisp(dge,design) | |
242 write("Estimating tagwise dispersion...",stdout()) | |
243 dge <- estimateGLMTagwiseDisp(dge,design) | |
244 | |
245 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
246 if(output_MDSplot_logFC != "/dev/null") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
247 write("Creating MDS plot (logFC method)",stdout()) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
248 points <- plotMDS.DGEList(dge,top=500,labels=rep("",nrow(dge\$samples)))# Get coordinates of unflexible plot |
25 | 249 dev.off()# Kill it |
250 | |
91 | 251 if(output_format_images == "pdf") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
252 pdf(output_MDSplot_logFC,height=14,width=14) |
55 | 253 } else if(output_format_images == "svg") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
254 svg(output_MDSplot_logFC,height=14,width=14) |
91 | 255 } else { |
256 ## png(output_MDSplot_logFC) | |
257 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
258 | |
259 bitmap(output_MDSplot_logFC,type="png16m",height=14,width=14) | |
70 | 260 } |
91 | 261 |
55 | 262 |
25 | 263 diff_x <- abs(max(points\$x)-min(points\$x)) |
264 diff_y <-(max(points\$y)-min(points\$y)) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
265 plot(c(min(points\$x),max(points\$x) + 0.45 * diff_x), c(min(points\$y) - 0.05 * diff_y,max(points\$y) + 0.05 * diff_y), main="edgeR logFC-MDS Plot on top 500 genes",type="n", xlab="Leading logFC dim 1", ylab="Leading logFC dim 2") |
25 | 266 points(points\$x,points\$y,pch=20) |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
267 text(points\$x, points\$y,rownames(dge\$samples),cex=1.25,col="gray",pos=4) |
25 | 268 rm(diff_x,diff_y) |
269 | |
270 dev.off() | |
271 } | |
272 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
273 if(output_MDSplot_bcv != "/dev/null") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
274 write("Creating MDS plot (bcv method)",stdout()) |
93 | 275 |
276 ## 1. First create a virtual plot to obtain the desired coordinates | |
277 pdf("bcvmds.pdf") | |
278 points <- plotMDS.DGEList(dge,method="bcv",top=500,labels=rep("",nrow(dge\$samples))) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
279 dev.off()# Kill it |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
280 |
93 | 281 ## 2. Re-plot the coordinates in a new figure with the size and settings. |
91 | 282 if(output_format_images == "pdf") { |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
283 pdf(output_MDSplot_bcv,height=14,width=14) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
284 } else if(output_format_images == "svg") { |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
285 svg(output_MDSplot_bcv,height=14,width=14) |
91 | 286 } else { |
287 ## png(output_MDSplot_bcv) | |
288 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
289 | |
290 bitmap(output_MDSplot_bcv,type="png16m",height=14,width=14) | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
291 } |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
292 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
293 diff_x <- abs(max(points\$x)-min(points\$x)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
294 diff_y <-(max(points\$y)-min(points\$y)) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
295 plot(c(min(points\$x),max(points\$x) + 0.45 * diff_x), c(min(points\$y) - 0.05 * diff_y,max(points\$y) + 0.05 * diff_y), main="edgeR BCV-MDS Plot",type="n", xlab="Leading BCV dim 1", ylab="Leading BCV dim 2") |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
296 points(points\$x,points\$y,pch=20) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
297 text(points\$x, points\$y,rownames(dge\$samples),cex=1.25,col="gray",pos=4) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
298 rm(diff_x,diff_y) |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
299 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
300 dev.off() |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
301 } |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
302 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
303 |
25 | 304 if(output_BCVplot != "/dev/null") { |
305 write("Creating Biological coefficient of variation plot",stdout()) | |
60 | 306 |
91 | 307 if(output_format_images == "pdf") { |
60 | 308 pdf(output_BCVplot) |
309 } else if(output_format_images == "svg") { | |
310 svg(output_BCVplot) | |
91 | 311 } else { |
312 ## png(output_BCVplot) | |
313 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
314 | |
315 bitmap(output_BCVplot,type="png16m") | |
70 | 316 } |
60 | 317 |
25 | 318 plotBCV(dge, cex=0.4, main="edgeR: Biological coefficient of variation (BCV) vs abundance") |
319 dev.off() | |
320 } | |
321 | |
322 | |
323 write("Fitting GLM...",stdout()) | |
324 fit <- glmFit(dge,design) | |
325 | |
326 write(paste("Performing likelihood ratio test: ",contrast,sep=""),stdout()) | |
327 cont <- c(contrast) | |
328 cont <- makeContrasts(contrasts=cont, levels=design) | |
329 | |
330 lrt <- glmLRT(fit, contrast=cont[,1]) | |
331 write(paste("Exporting to file: ",output_count_edgeR,sep=""),stdout()) | |
332 write.table(file=output_count_edgeR,topTags(lrt,n=nrow(read_counts))\$table,sep="\t",row.names=TRUE,col.names=NA) | |
333 write.table(file=output_cpm,cpm(dge,normalized.lib.sizes=TRUE),sep="\t",row.names=TRUE,col.names=NA) | |
334 | |
335 ## todo EXPORT FPKM | |
336 write.table(file=output_raw_counts,dge\$counts,sep="\t",row.names=TRUE,col.names=NA) | |
337 | |
34 | 338 if(output_MAplot != "/dev/null" || output_PValue_distribution_plot != "/dev/null") { |
25 | 339 etable <- topTags(lrt, n=nrow(dge))\$table |
340 etable <- etable[order(etable\$FDR), ] | |
32 | 341 |
342 if(output_MAplot != "/dev/null") { | |
343 write("Creating MA plot...",stdout()) | |
60 | 344 |
91 | 345 if(output_format_images == "pdf") { |
60 | 346 pdf(output_MAplot) |
347 } else if(output_format_images == "svg") { | |
348 svg(output_MAplot) | |
91 | 349 } else { |
350 ## png(output_MAplot) | |
351 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
352 | |
353 bitmap(output_MAplot,type="png16m") | |
70 | 354 } |
60 | 355 |
32 | 356 with(etable, plot(logCPM, logFC, pch=20, main="edgeR: Fold change vs abundance")) |
357 with(subset(etable, FDR < fdr), points(logCPM, logFC, pch=20, col="red")) | |
358 abline(h=c(-1,1), col="blue") | |
359 dev.off() | |
360 } | |
25 | 361 |
32 | 362 if(output_PValue_distribution_plot != "/dev/null") { |
363 write("Creating P-value distribution plot...",stdout()) | |
60 | 364 |
91 | 365 if(output_format_images == "pdf") { |
366 pdf(output_PValue_distribution_plot,width=14,height=14) | |
60 | 367 } else if(output_format_images == "svg") { |
91 | 368 svg(output_PValue_distribution_plot,width=14,height=14) |
369 } else { | |
370 ## png(output_PValue_distribution_plot) | |
371 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
372 | |
373 bitmap(output_PValue_distribution_plot,type="png16m",width=14,height=14) | |
70 | 374 } |
60 | 375 |
32 | 376 expressed_genes <- subset(etable, PValue < 0.99) |
377 h <- hist(expressed_genes\$PValue,breaks=nrow(expressed_genes)/15,main="Binned P-Values (< 0.99)") | |
378 center <- sum(h\$counts) / length(h\$counts) | |
379 lines(c(0,1),c(center,center),lty=2,col="red",lwd=2) | |
380 k <- ksmooth(h\$mid, h\$counts) | |
381 lines(k\$x,k\$y,col="red",lwd=2) | |
382 rmsd <- (h\$counts) - center | |
383 rmsd <- rmsd^2 | |
384 rmsd <- sum(rmsd) | |
385 rmsd <- sqrt(rmsd) | |
386 text(0,max(h\$counts),paste("e=",round(rmsd,2),sep=""),pos=4,col="blue") | |
387 ## change e into epsilon somehow | |
388 dev.off() | |
389 } | |
40 | 390 } |
391 | |
392 if(output_heatmap_plot != "/dev/null") { | |
60 | 393 |
91 | 394 if(output_format_images == "pdf") { |
60 | 395 pdf(output_heatmap_plot,width=10.5) |
396 } else if(output_format_images == "svg") { | |
397 svg(output_heatmap_plot,width=10.5) | |
91 | 398 } else { |
399 ## png(output_heatmap_plot) | |
400 ## png does not work out of the box in the Galaxy Toolshed Version of R due to its compile settings: https://biostar.usegalaxy.org/p/9170/ | |
401 | |
402 bitmap(output_heatmap_plot,type="png16m",width=10.5) | |
70 | 403 } |
60 | 404 |
40 | 405 etable2 <- topTags(lrt, n=100)\$table |
406 order <- rownames(etable2) | |
407 cpm_sub <- cpm(dge,normalized.lib.sizes=TRUE,log=TRUE)[as.numeric(order),] | |
408 heatmap(t(cpm_sub)) | |
409 dev.off() | |
25 | 410 } |
411 | |
412 ##output_hierarchical_clustering_plot = args[13] | |
413 | |
35 | 414 if(output_RData_obj != "/dev/null") { |
25 | 415 save.image(output_RData_obj) |
416 } | |
417 | |
418 write("Done!",stdout()) | |
419 } | |
420 </configfile> | |
421 </configfiles> | |
422 | |
423 <outputs> | |
53 | 424 <data format="tabular" name="output_count_edgeR" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - differentially expressed genes" /> |
25 | 425 <data format="tabular" name="output_cpm" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - CPM" /> |
426 | |
427 <data format="tabular" name="output_raw_counts" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - raw counts"> | |
53 | 428 <filter>outputs and ("make_output_raw_counts" in outputs)</filter> |
25 | 429 </data> |
430 | |
89
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
431 <data format="png" name="output_MDSplot_logFC" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MDS-plot (logFC method)"> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
432 <filter>outputs and ("make_output_MDSplot_logFC" in outputs)</filter> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
433 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
434 <change_format> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
435 <when input="output_format_images" value="png" format="png" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
436 <when input="output_format_images" value="pdf" format="pdf" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
437 <when input="output_format_images" value="svg" format="svg" /> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
438 </change_format> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
439 </data> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
440 |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
441 <data format="png" name="output_MDSplot_bcv" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MDS-plot (bcv method)"> |
875f080136b6
Solved a very serious bug: if contrast and design matrix described samples not in the same order, statistical analysis goes wrong
yhoogstrate
parents:
83
diff
changeset
|
442 <filter>outputs and ("make_output_MDSplot_bcv" in outputs)</filter> |
59 | 443 |
444 <change_format> | |
445 <when input="output_format_images" value="png" format="png" /> | |
446 <when input="output_format_images" value="pdf" format="pdf" /> | |
447 <when input="output_format_images" value="svg" format="svg" /> | |
448 </change_format> | |
25 | 449 </data> |
450 | |
60 | 451 <data format="png" name="output_BCVplot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - BCV-plot"> |
53 | 452 <filter>outputs and ("make_output_BCVplot" in outputs)</filter> |
60 | 453 |
454 <change_format> | |
455 <when input="output_format_images" value="png" format="png" /> | |
456 <when input="output_format_images" value="pdf" format="pdf" /> | |
457 <when input="output_format_images" value="svg" format="svg" /> | |
458 </change_format> | |
25 | 459 </data> |
460 | |
60 | 461 <data format="png" name="output_MAplot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - MA-plot"> |
53 | 462 <filter>outputs and ("make_output_MAplot" in outputs)</filter> |
60 | 463 |
464 <change_format> | |
465 <when input="output_format_images" value="png" format="png" /> | |
466 <when input="output_format_images" value="pdf" format="pdf" /> | |
467 <when input="output_format_images" value="svg" format="svg" /> | |
468 </change_format> | |
25 | 469 </data> |
470 | |
60 | 471 <data format="png" name="output_PValue_distribution_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - P-Value distribution"> |
53 | 472 <filter>outputs and ("make_output_PValue_distribution_plot" in outputs)</filter> |
60 | 473 |
474 <change_format> | |
475 <when input="output_format_images" value="png" format="png" /> | |
476 <when input="output_format_images" value="pdf" format="pdf" /> | |
477 <when input="output_format_images" value="svg" format="svg" /> | |
478 </change_format> | |
25 | 479 </data> |
480 | |
60 | 481 <data format="png" name="output_hierarchical_clustering_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - Hierarchical custering"> |
53 | 482 <filter>outputs and ("make_output_hierarchical_clustering_plot" in outputs)</filter> |
60 | 483 |
484 <change_format> | |
485 <when input="output_format_images" value="png" format="png" /> | |
486 <when input="output_format_images" value="pdf" format="pdf" /> | |
487 <when input="output_format_images" value="svg" format="svg" /> | |
488 </change_format> | |
25 | 489 </data> |
490 | |
60 | 491 <data format="png" name="output_heatmap_plot" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - Heatmap"> |
53 | 492 <filter>outputs and ("make_output_heatmap_plot" in outputs)</filter> |
60 | 493 |
494 <change_format> | |
495 <when input="output_format_images" value="png" format="png" /> | |
496 <when input="output_format_images" value="pdf" format="pdf" /> | |
497 <when input="output_format_images" value="svg" format="svg" /> | |
498 </change_format> | |
25 | 499 </data> |
500 | |
501 <data format="RData" name="output_RData_obj" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - R data object"> | |
53 | 502 <filter>outputs and ("make_output_RData_obj" in outputs)</filter> |
25 | 503 </data> |
504 | |
40 | 505 <data format="txt" name="output_R" label="edgeR DGE on ${design_matrix.hid}: ${design_matrix.name} - R output (debug)" > |
53 | 506 <filter>outputs and ("make_output_R_stdout" in outputs)</filter> |
25 | 507 </data> |
508 </outputs> | |
509 | |
94 | 510 <tests> |
511 <test> | |
512 <param name="expression_matrix" value="Differential_Gene_Expression/expression_matrix.tabular.txt" /> | |
513 <param name="design_matrix" value="Differential_Gene_Expression/design_matrix.tabular.txt" /> | |
514 | |
515 <param name="contrast" value="E-C"/> | |
516 | |
517 <param name="fdr" value="0.05" /> | |
518 | |
519 <param name="output_format_images" value="png" /> | |
520 | |
521 <output name="output_count_edgeR" file="Differential_Gene_Expression/differentially_expressed_genes.tabular.txt" /> | |
522 </test> | |
523 </tests> | |
524 | |
25 | 525 <help> |
526 edgeR: Differential Gene(Expression) Analysis | |
36 | 527 ############################################# |
25 | 528 |
36 | 529 Overview |
530 -------- | |
531 Differential expression analysis of RNA-seq and digital gene expression profiles with biological replication. Uses empirical Bayes estimation and exact tests based on the negative binomial distribution. Also useful for differential signal analysis with other types of genome-scale count data [1]. | |
25 | 532 |
533 For every experiment, the algorithm requires a design matrix. This matrix describes which samples belong to which groups. | |
36 | 534 More details on this are given in the edgeR manual: http://www.bioconductor.org/packages/2.12/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf |
25 | 535 and the limma manual. |
536 | |
537 Because the creation of a design matrix can be complex and time consuming, especially if no GUI is used, this package comes with an alternative tool which can help you with it. | |
538 This tool is called *edgeR Design Matrix Creator*. | |
539 If the appropriate design matrix (with corresponding links to the files) is given, | |
540 the correct contrast ( http://en.wikipedia.org/wiki/Contrast_(statistics) ) has to be given. | |
541 | |
542 If you have for example two groups, with an equal weight, you would like to compare either | |
79 | 543 "g1-g2" or "normal-cancer". |
25 | 544 |
36 | 545 The test function makes use of a MCF7 dataset used in a study that indicates that a higher sequencing depth is not neccesairily more important than a higher amount of replaciates[2]. |
25 | 546 |
36 | 547 Input |
548 ----- | |
549 Expression matrix | |
550 ^^^^^^^^^^^^^^^^^ | |
551 :: | |
25 | 552 |
553 Geneid "\t" Sample-1 "\t" Sample-2 "\t" Sample-3 "\t" Sample-4 [...] "\n" | |
554 SMURF "\t" 123 "\t" 21 "\t" 34545 "\t" 98 ... "\n" | |
555 BRCA1 "\t" 435 "\t" 6655 "\t" 45 "\t" 55 ... "\n" | |
556 LINK33 "\t" 4 "\t" 645 "\t" 345 "\t" 1 ... "\n" | |
557 SNORD78 "\t" 498 "\t" 65 "\t" 98 "\t" 27 ... "\n" | |
558 [...] | |
559 | |
36 | 560 *Note: Make sure the number of columns in the header is identical to the number of columns in the body.* |
25 | 561 |
36 | 562 Design matrix |
563 ^^^^^^^^^^^^^ | |
564 :: | |
25 | 565 |
566 Sample "\t" Condition "\t" Ethnicity "\t" Patient "\t" Batch "\n" | |
567 Sample-1 "\t" Tumor "\t" European "\t" 1 "\t" 1 "\n" | |
568 Sample-2 "\t" Normal "\t" European "\t" 1 "\t" 1 "\n" | |
569 Sample-3 "\t" Tumor "\t" European "\t" 2 "\t" 1 "\n" | |
570 Sample-4 "\t" Normal "\t" European "\t" 2 "\t" 1 "\n" | |
571 Sample-5 "\t" Tumor "\t" African "\t" 3 "\t" 1 "\n" | |
572 Sample-6 "\t" Normal "\t" African "\t" 3 "\t" 1 "\n" | |
573 Sample-7 "\t" Tumor "\t" African "\t" 4 "\t" 2 "\n" | |
574 Sample-8 "\t" Normal "\t" African "\t" 4 "\t" 2 "\n" | |
575 Sample-9 "\t" Tumor "\t" Asian "\t" 5 "\t" 2 "\n" | |
576 Sample-10 "\t" Normal "\t" Asian "\t" 5 "\t" 2 "\n" | |
577 Sample-11 "\t" Tumor "\t" Asian "\t" 6 "\t" 2 "\n" | |
578 Sample-12 "\t" Normal "\t" Asian "\t" 6 "\t" 2 "\n" | |
579 | |
36 | 580 *Note: Avoid factor names that are (1) numerical, (2) contain mathematical symbols and preferebly only use letters.* |
25 | 581 |
36 | 582 Contrast |
583 ^^^^^^^^ | |
584 The contrast represents the biological question. There can be many questions asked, e.g.: | |
25 | 585 |
36 | 586 - Tumor-Normal |
587 - African-European | |
588 - 0.5*(Control+Placebo) / Treated | |
25 | 589 |
36 | 590 Installation |
591 ------------ | |
25 | 592 |
593 This tool requires no specific configurations. The following dependencies are installed automatically: | |
36 | 594 |
595 - R | |
596 - Bioconductor | |
79 | 597 - limma |
598 - edgeR | |
25 | 599 |
36 | 600 License |
601 ------- | |
602 - R | |
79 | 603 - GPL 2 & GPL 3 |
36 | 604 - limma |
605 - GPL (>=2) | |
606 - edgeR | |
79 | 607 - GPL (>=2) |
36 | 608 |
609 References | |
610 ---------- | |
611 | |
612 EdgeR | |
613 ^^^^^ | |
614 **[1] edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.** | |
25 | 615 |
36 | 616 *Mark D. Robinson, Davis J. McCarthy and Gordon K. Smyth* - Bioinformatics (2010) 26 (1): 139-140. |
617 | |
618 - http://www.bioconductor.org/packages/2.12/bioc/html/edgeR.html | |
619 - http://dx.doi.org/10.1093/bioinformatics/btp616 | |
620 - http://www.bioconductor.org/packages/release/bioc/html/edgeR.html | |
25 | 621 |
36 | 622 Test-data (MCF7) |
623 ^^^^^^^^^^^^^^^^ | |
624 **[2] RNA-seq differential expression studies: more sequence or more replication?** | |
625 | |
626 *Yuwen Liu, Jie Zhou and Kevin P. White* - Bioinformatics (2014) 30 (3): 301-304. | |
627 | |
628 - http://www.ncbi.nlm.nih.gov/pubmed/24319002 | |
629 - http://dx.doi.org/10.1093/bioinformatics/btt688 | |
630 | |
631 Contact | |
632 ------- | |
79 | 633 |
634 The tool wrapper has been written by Youri Hoogstrate from the Erasmus | |
635 Medical Center (Rotterdam, Netherlands) on behalf of the Translational | |
636 Research IT (TraIT) project: | |
83 | 637 |
25 | 638 http://www.ctmm.nl/en/programmas/infrastructuren/traitprojecttranslationeleresearch |
639 | |
79 | 640 More tools by the Translational Research IT (TraIT) project can be found |
641 in the following toolsheds: | |
83 | 642 |
643 http://toolshed.dtls.nl/ | |
644 | |
645 http://toolshed.g2.bx.psu.edu | |
646 | |
647 http://testtoolshed.g2.bx.psu.edu/ | |
79 | 648 |
36 | 649 I would like to thank Hina Riaz - Naz Khan for her helpful contribution. |
25 | 650 </help> |
94 | 651 |
652 <citations> | |
653 <citation type="doi">10.1093/bioinformatics/btp616</citation> | |
654 <citation type="doi">10.1093/bioinformatics/btt688</citation> | |
655 </citations> | |
25 | 656 </tool> |