# HG changeset patch # User mingchen0919 # Date 1520558010 18000 # Node ID 405d8fa2f560a2f32da23c96692d853f198ce542 # Parent 157dc02992d90a26e500e270fc24e921405d41de version 2.2.0 diff -r 157dc02992d9 -r 405d8fa2f560 DESeq.Rmd --- a/DESeq.Rmd Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,97 +0,0 @@ ---- -title: 'DESeq2: Perform DESeq analysis' -output: - html_document: - number_sections: true - toc: true - theme: cosmo - highlight: tango ---- - -```{r setup, include=FALSE, warning=FALSE, message=FALSE} -knitr::opts_chunk$set( - echo = as.logical(opt$X_e), - error = TRUE -) -``` - -# `DESeqDataSet` object - -```{r 'DESeqDataSet object'} -count_file_paths = strsplit(opt$X_P, ',')[[1]] -count_file_names = strsplit(opt$X_N, ',')[[1]] -sample_table = read.table(opt$X_S, header = TRUE) -row.names(sample_table) = sample_table[,2] -sample_table = sample_table[count_file_names, ] - -## copy count files into OUTPUT_DIR/counts -dir.create(paste0(OUTPUT_DIR, '/counts'), recursive = TRUE) -file_copy = file.copy(count_file_paths, paste0(OUTPUT_DIR, '/counts/', count_file_names), overwrite = TRUE) - -## DESeqDataSet object -dds = DESeqDataSetFromHTSeqCount(sampleTable = sample_table, - directory = paste0(OUTPUT_DIR, '/counts'), - design = formula(opt$X_p)) -dds -``` - -# Pre-filtering the dataset. - -We can remove the rows that have 0 or 1 count to reduce object size and increase the calculation speed. - -* Number of rows before pre-filtering -```{r} -nrow(dds) -``` - -* Number of rows after pre-filtering -```{r} -dds = dds[rowSums(counts(dds)) > 1, ] -nrow(dds) -``` - -# Peek at data {.tabset} - -## Count Data - -```{r 'count data'} -datatable(head(counts(dds), 100), style="bootstrap", - class="table-condensed", options = list(dom = 'tp', scrollX = TRUE)) -``` - -## Sample Table - -```{r 'sample table'} -datatable(sample_table, style="bootstrap", - class="table-condensed", options = list(dom = 'tp', scrollX = TRUE)) -``` - -# Sample distance on variance stabilized data {.tabset} - -## `rlog` Stabilizing transformation - -```{r} -rld = rlog(dds, blind = FALSE) -datatable(head(assay(rld), 100), style="bootstrap", - class="table-condensed", options = list(dom = 'tp', scrollX = TRUE)) -``` - -## Sample distance - -```{r} -sampleDists <- dist(t(assay(rld))) -sampleDists -``` - -# Differential expression analysis - -```{r} -dds <- DESeq(dds) -``` - -```{r echo=FALSE} -# save objects except for opt. -save(list=ls()[ls() != "opt"], file=opt$X_w) -``` - - diff -r 157dc02992d9 -r 405d8fa2f560 DESeq.xml --- a/DESeq.xml Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,106 +0,0 @@ - - - "some description" - - - pandoc - r-getopt - r-rmarkdown - bioconductor-deseq2 - r-dt - r-pheatmap - - - - - - - - - - - - - - - - - - - - - - - - - - @article{love2014moderated, - title={Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}, - author={Love, Michael I and Huber, Wolfgang and Anders, Simon}, - journal={Genome biology}, - volume={15}, - number={12}, - pages={550}, - year={2014}, - publisher={BioMed Central} - } - - - - - diff -r 157dc02992d9 -r 405d8fa2f560 DESeq_render.R --- a/DESeq_render.R Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,66 +0,0 @@ -##============ Sink warnings and errors to a file ============== -## use the sink() function to wrap all code within it. -##============================================================== -zz = file('warnings_and_errors.txt') -sink(zz) -sink(zz, type = 'message') - -#------------import libraries-------------------- -options(stringsAsFactors = FALSE) - -library(getopt) -library(rmarkdown) -library(DESeq2) -library(pheatmap) -library(DT) -#------------------------------------------------ - - -#------------get arguments into R-------------------- -# getopt_specification_matrix(extract_short_flags('')) %>% -# write.table(file = 'spec.txt', sep = ',', row.names = FALSE, col.names = TRUE, quote = FALSE) - - -spec_matrix = as.matrix( - data.frame(stringsAsFactors=FALSE, - long_flags = c("X_e", "X_o", "X_d", "X_s", "X_t", "X_P", "X_N", - "X_S", "X_p", "X_w"), - short_flags = c("e", "o", "d", "s", "t", "P", "N", "S", "p", "w"), - argument_mask_flags = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), - data_type_flags = c("character", "character", "character", "character", - "character", "character", "character", - "character", "character", "character") - ) -) -opt = getopt(spec_matrix) -#---------------------------------------------------- - - -#-----------using passed arguments in R -# to define system environment variables--- -do.call(Sys.setenv, opt[-1]) -#---------------------------------------------------- - -#---------- often used variables ---------------- -# OUTPUT_DIR: path to the output associated directory, which stores all outputs -# TOOL_DIR: path to the tool installation directory -# RMD_NAME: name of Rmd file to be rendered -# OUTPUT_REPORT: path to galaxy output report -OUTPUT_DIR = opt$X_d -TOOL_DIR = opt$X_t -RMD_NAME = 'DESeq.Rmd' -OUTPUT_REPORT = opt$X_o - -# create the output associated directory to store all outputs -dir.create(OUTPUT_DIR, recursive = TRUE) - -#-----------------render Rmd-------------- -render(paste0(TOOL_DIR, '/', RMD_NAME), output_file = OUTPUT_REPORT) -#------------------------------------------ - -#==============the end============== - - -##--------end of code rendering .Rmd templates---------------- -sink() -##=========== End of sinking output============================= \ No newline at end of file diff -r 157dc02992d9 -r 405d8fa2f560 DESeq_results.Rmd --- a/DESeq_results.Rmd Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,109 +0,0 @@ ---- -title: 'DESeq2: Results' -output: - html_document: - number_sections: true - toc: true - theme: cosmo - highlight: tango ---- - -```{r setup, include=FALSE, warning=FALSE, message=FALSE} -knitr::opts_chunk$set( - echo = as.logical(opt$X_e), - error = TRUE -) -``` - - -```{r eval=TRUE} -# Import workspace -# fcp = file.copy(opt$X_W, "deseq.RData") -load(opt$X_W) -``` - -# Results {.tabset} - -## Result table - -```{r} -cat('--- View the top 100 rows of the result table ---') -res <- results(dds, contrast = c(opt$X_C, opt$X_T, opt$X_K)) -write.csv(as.data.frame(res), file = opt$X_R) -res_df = as.data.frame(res)[1:100, ] -datatable(res_df, style="bootstrap", filter = 'top', - class="table-condensed", options = list(dom = 'tp', scrollX = TRUE)) -``` - -## Result summary - -```{r} -summary(res) -``` - - -# MA-plot {.tabset} - - - -```{r} -cat('--- Shrinked with Bayesian procedure ---') -plotMA(res) -``` - - -# Histogram of p values - -```{r} -hist(res$pvalue[res$baseMean > 1], breaks = 0:20/20, - col = "grey50", border = "white", main = "", - xlab = "Mean normalized count larger than 1") -``` - - -# Visualization {.tabset} -## Gene clustering - -```{r} -clustering_groups = strsplit(opt$X_M, ',')[[1]] - -topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 20) -mat <- assay(rld)[ topVarGenes, ] -mat <- mat - rowMeans(mat) -annotation_col <- as.data.frame(colData(rld)[, clustering_groups]) -colnames(annotation_col) = clustering_groups -rownames(annotation_col) = colnames(mat) -pheatmap(mat, annotation_col = annotation_col) -``` - -## Sample-to-sample distance - -```{r} -sampleDistMatrix <- as.matrix( sampleDists ) -colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) -pheatmap(sampleDistMatrix, - clustering_distance_cols = sampleDists, - col = colors) -``` - -## PCA plot - -```{r} -plotPCA(rld, intgroup = clustering_groups) -``` - -## MDS plot {.tabset} - -### Data table -```{r} -mds <- as.data.frame(colData(rld)) %>% - cbind(cmdscale(sampleDistMatrix)) -knitr::kable(mds) -``` - -### Plot -```{r} -ggplot(mds, aes(x = `1`, y = `2`, col = time)) + - geom_point(size = 3) + coord_fixed() -``` - diff -r 157dc02992d9 -r 405d8fa2f560 DESeq_results.xml --- a/DESeq_results.xml Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,99 +0,0 @@ - - - pandoc - r-getopt - r-rmarkdown - bioconductor-deseq2 - r-dt - r-pheatmap - - - An R Markdown tool to display DESeq analysis. - - - - - - - - - - - - - - - - - - - - - - - - @article{love2014moderated, - title={Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}, - author={Love, Michael I and Huber, Wolfgang and Anders, Simon}, - journal={Genome biology}, - volume={15}, - number={12}, - pages={550}, - year={2014}, - publisher={BioMed Central} - } - - - @article{allaire2016rmarkdown, - title={rmarkdown: Dynamic Documents for R, 2016}, - author={Allaire, J and Cheng, Joe and Xie, Yihui and McPherson, Jonathan and Chang, Winston and Allen, Jeff - and Wickham, Hadley and Atkins, Aron and Hyndman, Rob}, - journal={R package version 0.9}, - volume={6}, - year={2016} - } - - - @book{xie2015dynamic, - title={Dynamic Documents with R and knitr}, - author={Xie, Yihui}, - volume={29}, - year={2015}, - publisher={CRC Press} - } - - - \ No newline at end of file diff -r 157dc02992d9 -r 405d8fa2f560 DESeq_results_render.R --- a/DESeq_results_render.R Tue Feb 27 20:44:34 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,71 +0,0 @@ -##============ Sink warnings and errors to a file ============== -## use the sink() function to wrap all code within it. -##============================================================== -zz = file('warnings_and_errors.txt') -sink(zz) -sink(zz, type = 'message') - -#------------import libraries-------------------- -options(stringsAsFactors = FALSE) - -library(getopt) -library(rmarkdown) -library(DESeq2) -library(pheatmap) -library(DT) -library(ggplot2) -library(genefilter) -library(RColorBrewer) -#------------------------------------------------ - - -#------------get arguments into R-------------------- -# getopt_specification_matrix(extract_short_flags('DESeq_results.xml')) %>% -# write.table(file = 'spec.txt', sep = ',', row.names = FALSE, col.names = TRUE, quote = FALSE) - - -spec_matrix = as.matrix( - data.frame(stringsAsFactors=FALSE, - long_flags = c("X_e", "X_W", "X_C", "X_T", "X_K", "X_M", "X_o", - "X_d", "X_s", "X_R", "X_t"), - short_flags = c("e", "W", "C", "T", "K", "M", "o", "d", "s", "R", - "t"), - argument_mask_flags = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), - data_type_flags = c("character", "character", "character", "character", - "character", "character", "character", - "character", "character", "character", "character") - ) -) -opt = getopt(spec_matrix) -opt -#---------------------------------------------------- - - -#-----------using passed arguments in R -# to define system environment variables--- -do.call(Sys.setenv, opt[-1]) -#---------------------------------------------------- - -#---------- often used variables ---------------- -# OUTPUT_DIR: path to the output associated directory, which stores all outputs -# TOOL_DIR: path to the tool installation directory -# RMD_NAME: name of Rmd file to be rendered -# OUTPUT_REPORT: path to galaxy output report -OUTPUT_DIR = opt$X_d -TOOL_DIR = opt$X_t -RMD_NAME = 'DESeq_results.Rmd' -OUTPUT_REPORT = opt$X_o - -# create the output associated directory to store all outputs -dir.create(OUTPUT_DIR, recursive = TRUE) - -#-----------------render Rmd-------------- -render(paste0(TOOL_DIR, '/', RMD_NAME), output_file = OUTPUT_REPORT) -#------------------------------------------ - -#==============the end============== - - -##--------end of code rendering .Rmd templates---------------- -sink() -##=========== End of sinking output============================= \ No newline at end of file diff -r 157dc02992d9 -r 405d8fa2f560 deseq2.Rmd --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/deseq2.Rmd Thu Mar 08 20:13:30 2018 -0500 @@ -0,0 +1,29 @@ +--- +title: 'HTML report title' +output: + html_document: + number_sections: true + toc: true + theme: cosmo + highlight: tango +--- + +```{r setup, include=FALSE, warning=FALSE, message=FALSE} +knitr::opts_chunk$set( + echo = as.logical(), + error = TRUE +) +``` + + +# Code for computational analysis + +```{r 'step 1'} + +``` + +```{r 'ste[ 2'} + +``` + + diff -r 157dc02992d9 -r 405d8fa2f560 deseq2.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/deseq2.sh Thu Mar 08 20:13:30 2018 -0500 @@ -0,0 +1,17 @@ +Rscript '${__tool_directory__}/deseq2_render.R' + + -e $echo + -o $report + -d $report.files_path + -s $sink_message + -t '${__tool_directory__}' + + -A $count_data + -B $column_data + -C $design_formula + -D $treatment_name + -E $treated + -F $untreated + -G $test_type + -H $fit_type + -I $alpha \ No newline at end of file diff -r 157dc02992d9 -r 405d8fa2f560 deseq2.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/deseq2.xml Thu Mar 08 20:13:30 2018 -0500 @@ -0,0 +1,106 @@ + + Differential analysis of count data with the DESeq2 package + + pandoc + r-getopt + r-rmarkdown + r-plotly + r-ggplot2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff -r 157dc02992d9 -r 405d8fa2f560 deseq2_render.R --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/deseq2_render.R Thu Mar 08 20:13:30 2018 -0500 @@ -0,0 +1,55 @@ +##============ Sink warnings and errors to a file ============== +## use the sink() function to wrap all code within it. +##============================================================== +zz = file('warnings_and_errors.txt') +sink(zz) +sink(zz, type = 'message') + +#------------import libraries-------------------- +options(stringsAsFactors = FALSE) + +library(getopt) +library(rmarkdown) +library(ggplot2) +library(plotly) +library(htmltools) +#------------------------------------------------ + + +#------------get arguments into R-------------------- +library(dplyr) +getopt_specification_matrix(extract_short_flags('deseq2.xml')) %>% + write.table(file = 'spec.txt', sep = ',', row.names = FALSE, col.names = TRUE, quote = FALSE) + + +spec_matrix = as.matrix() +opt = getopt(spec_matrix) +#---------------------------------------------------- + + +#-----------using passed arguments in R +# to define system environment variables--- +do.call(Sys.setenv, opt[-1]) +#---------------------------------------------------- + +#---------- often used variables ---------------- +# OUTPUT_DIR: path to the output associated directory, which stores all outputs +# TOOL_DIR: path to the tool installation directory +OUTPUT_DIR = '' +TOOL_DIR = '' +RMD_NAME = '' +OUTPUT_REPORT = opt$X_o + +# create the output associated directory to store all outputs +dir.create(OUTPUT_DIR, recursive = TRUE) + +#-----------------render Rmd-------------- +render(paste0(TOOL_DIR, RMD_NAME, sep = '/'), output_file = OUTPUT_REPORT) +#------------------------------------------ + +#==============the end============== + + +##--------end of code rendering .Rmd templates---------------- +sink() +##=========== End of sinking output============================= \ No newline at end of file diff -r 157dc02992d9 -r 405d8fa2f560 spec.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/spec.txt Thu Mar 08 20:13:30 2018 -0500 @@ -0,0 +1,6 @@ +long_flags,short_flags,argument_mask_flags,data_type_flags +X_e,e,1,character +X_o,o,1,character +X_d,d,1,character +X_s,s,1,character +X_t,t,1,character