comparison aurora_wgcna_trait.Rmd @ 3:b915c2c92e66 draft

Uploaded
author spficklin
date Wed, 20 Nov 2019 00:56:30 +0000
parents
children ffbafe466107
comparison
equal deleted inserted replaced
2:d947c300617c 3:b915c2c92e66
1 ---
2 title: 'WGCNA: Module-Trait Association Report.'
3 output:
4 html_document:
5 number_sections: false
6 toc: true
7 ---
8
9 ```{r setup, include=FALSE, warning=FALSE, message=FALSE}
10 knitr::opts_chunk$set(error = TRUE, echo = FALSE)
11 ```
12 ```{r}
13 # Load the data from the previous step.
14 load(file=opt$r_data)
15 ```
16 # Sample Annotation Data
17
18 The table below shows the list of sample annotations you provided. Sample annotations can include categorical data related to experimental conditions, or numerical trait or phenotype data for each sample.
19
20 ```{r}
21 trait_data = data.frame()
22 trait_data = read.csv(opt$trait_data, header = TRUE, row.names = opt$sname_col)
23 sample_names = rownames(gemt)
24 trait_rows = match(sample_names, rownames(trait_data))
25 trait_data = trait_data[trait_rows, ]
26 datatable(trait_data)
27 ```
28 # Module-Condition Association.
29 Now that we have sample annotations, we can explore if any of the network modules are asociated with these features. First, is an empirical exploration by viewing again the sample dendrogram but with sample annotations added and colored by category or numerical intensity, as appropriate. If groups of samples with similar expression also share similar annotations then the same colors will appear "in blocks" under the clustered samples. This view does not indicate associations but can help visualize when some modules might be associated.
30
31 ```{r fig.align='center', fig.width=8, fig.height=9}
32
33 # Determine the column types within the trait annotation data.
34 trait_types = sapply(trait_data, class)
35
36 # Set the colors for the quantitative data.
37 quantitative_fields = which(trait_types == "numeric")
38 quantitative_colors = numbers2colors(trait_data[,quantitative_fields], signed = FALSE)
39 colnames(quantitative_colors) = colnames(trait_data[,quantitative_fields])
40
41 # Set the colors for the categorical data.
42 categorical_fields = which(trait_types == "factor")
43 categorical_colors = labels2colors(trait_data[,categorical_fields])
44 colnames(categorical_colors) = colnames(trait_data[,categorical_fields])
45
46 # Set the colors for the ordinal data.
47 ordinal_fields = which(trait_types == "integer")
48 ordinal_colors = numbers2colors(trait_data[,ordinal_fields], signed = FALSE)
49 colnames(ordinal_colors) = colnames(trait_data[,ordinal_fields])
50
51 # Combine the colors in their original order as the trait_data data frame.
52 traitColors = cbind(quantitative_colors, categorical_colors, ordinal_colors)
53
54 plotDendroAndColors(sampleTree, traitColors,
55 groupLabels = names(trait_data),
56 main = "Sample Dendrogram and Annotation Heatmap",
57 cex.dendroLabels = 0.5)
58
59 ```
60
61 To statistically identify the associations, correlation tests are performed of the eigengenes of each module with the annotation data. The following heatmap shows the results between each annotation feature and each module. Modules with a signficant positive assocation have a correlation value near 1. Modules with a significant negative association have a correlation value near -1. Modules with no correlation have a value near 0.
62
63 ```{r fig.align='center', fig.width=15, fig.height=15}
64 MEs = orderMEs(MEs)
65 moduleTraitCor = cor(MEs, trait_data, use = "p");
66 moduleTraitPvalue = corPvalueStudent(moduleTraitCor, n_samples);
67
68 textMatrix = paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1), ")", sep = "");
69 dim(textMatrix) = dim(moduleTraitCor)
70 par(mar = c(6, 8.5, 3, 3));
71
72 # Display the correlation values within a heatmap plot
73 labeledHeatmap(Matrix = moduleTraitCor,
74 xLabels = names(trait_data),
75 yLabels = names(MEs),
76 ySymbols = names(MEs),
77 colorLabels = FALSE,
78 colors = blueWhiteRed(50),
79 textMatrix = textMatrix,
80 setStdMargins = FALSE,
81 cex.text = 0.5,
82 zlim = c(-1,1),
83 main = paste("Module-trait relationships"))
84 ```
85
86 ```{r}
87 output = cbind(moduleTraitCor, moduleTraitPvalue)
88 write.csv(output, file = opt$module_association_file, quote=FALSE, row.names=TRUE)
89 ```
90 A file has been generated named `module_association.csv` which conatins the list of modules, and their correlation values as well as p-values indicating the strength of the associations.
91 ```{r}
92 # names (colors) of the modules
93 modNames = substring(names(MEs), 3)
94 geneModuleMembership = as.data.frame(cor(gemt, MEs, use = "p"));
95 MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), n_samples));
96 names(geneModuleMembership) = paste("MM", modNames, sep="");
97 names(MMPvalue) = paste("p.MM", modNames, sep="");
98
99 # Calculate the gene trait significance as a Pearson's R and p-value.
100 gts = as.data.frame(cor(gemt, trait_data, use = "p"));
101 gtsp = as.data.frame(corPvalueStudent(as.matrix(gts), n_samples));
102 colnames(gtsp) = c(paste("p", names(trait_data), sep="."))
103 colnames(gts) = c(paste("GS", names(trait_data), sep="."))
104
105 # Write out the gene information.
106 output = cbind(Module = module_labels, gts, gtsp)
107 write.csv(output, file = opt$gene_association_file, quote=FALSE, row.names=TRUE)
108
109 ```
110 Genes themselves can also have assocation with sample annotations. This is calculated via a traditional correlation test as well. Another file has been generated named `gene_association.csv` which provides the list of genes, the modules they belong to and the assocaition of each gene to the sample annotation features.
111