annotate ttest/stats.xml @ 2:f1929875b1b3

Bug fix: added support for the condition that the clinical vector (from which the two categorical values are drawn) might be empty for some rows
author melissacline
date Wed, 06 Aug 2014 12:34:04 -0700
parents 12bb38e187b9
children a04e3c59e117
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
1 <tool id="ucscCancerBrowserStats" description="Statistical Tests of Difference" name="UCSC Cancer Browser Stats" version="0.0.1">
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
2 <description>Apply statistical tests of difference to the rows in a genomic matrix, where the columns are categorized by a second (clinical) matrix</description>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
3 <command interpreter="python">
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
4 stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}"
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
5 </command>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
6 <inputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
7 <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
8 <param format="tabular" name="clinicalFeatures" type="data" label="Clinical Matrix"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
9 <param type="text" name="category1" label="Category 1" optional="false"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
10 <param type="text" name="category2" label="Category 2" optional="false"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
11 </inputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
12 <outputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
13 <data format="tabular" name="outFile" />
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
14 </outputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
15 <requirements>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
16 <requirement type="python-module">numpy</requirement>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
17 </requirements>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
18 <tests>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
19 <param name="genomicMatrix" value="sample.genomic.matrix.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
20 <param name="clinicalMatrix" value="sample.clinical.matrix.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
21 <param name="category1" value="A"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
22 <param name="category2" value="B"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
23 <output name="outFile" value="sample.stats.output.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
24 </tests>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
25 <help>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
26
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
27 This tool performs statistical tests found in the UCSC Cancer Genomics
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
28 Browser. The input data is a genomic matrix (containing genomic data,
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
29 with rows representing genes or probes and columns representing
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
30 samples or patients), a clinical matrix of two (or more) columns
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
31 assigning categorical values to the samples, and two categorical
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
32 values of interest. The tool identifies the samples corresponding to
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
33 each categorical value, then identifies the columns in the genomic
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
34 matrix corresponding to those sets of samples, which identifies two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
35 groups of columns. For each row in the genomic matrix, it extracts
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
36 the value for those two sets of columns, performs a t-test on the two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
37 sets of values, and returns the result for the row. Any values for
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
38 any columns NOT pertaining to one of the categorical values of
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
39 interest are ignored.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
40
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
41 The user runs this tool with th following steps:
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
42
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
43
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
44 1. Specify a genomic matrix. The expected format is with rows representing
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
45 genes and columns representing samples, and the first line contains sample
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
46 names.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
47
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
48 2. Specify a clinical matrix. Here, rows indicate samples, columns
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
49 indicate clinical features, and the header row contains feature names.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
50 The first column MUST indicate the sample names, and MUST correspond
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
51 to the column names of the genomic matrix. The clinical feature of
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
52 interest MUST be in the second column. Any other columns will be
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
53 ignored.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
54
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
55
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
56 3. Indicate two clinical values that you want to use for defining the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
57 two groups. For example, the two groups could be "Red group" and
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
58 "Green group", 0 and 1, or whatever.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
59
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
60 The output indicates, for each row, the t-statistic reporting on the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
61 difference between the two groups of columns (as specified by the two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
62 clinical values), the p-value corresponding to that t-statistic, the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
63 median value for each group, and the difference between the medians. If it
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
64 cannot calculate these values, it returns a vector of NAs.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
65
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
66 For example, given the following genomic matrix for (1)::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
67
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
68 Gene 1 2 3 4 5 6 7 8 9 10
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
69 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
70 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
71
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
72 and given the following clinical matrix for (2)::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
73
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
74 sample_id Value
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
75 1 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
76 2 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
77 3 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
78 4 C
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
79 5 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
80 6 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
81 7 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
82 8 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
83 9 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
84 10 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
85
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
86 and given A for Category 1 and B for Category 2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
87
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
88 the tool will assemble the following two groups of values::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
89
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
90 G1 A:(2.0, 2.2, 3.2, 1.1, 0.2) B:(3.2, 5.1, 8.1, 8.1)
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
91 G2 A:(0.1, 8.2, 3.9, 2.3, 0.2) B:(9.1, 6.1, 4.9, 1.1)
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
92
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
93 Note that the values for sample_id 4 do not appear, because it has a Value
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
94 of C in the second column, which is neither A nor B.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
95
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
96 And it will return the output::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
97
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
98 Gene Statistic pValue Median1 Median2 Delta
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
99 G1 -4.168999 0.004194 2.000000 6.600000 -4.600000
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
100 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
101
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
102
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
103 </help>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
104 </tool>