annotate ttest/stats.xml @ 6:6ca836b7e0b4

Changed the importing for stats.py
author melissacline
date Mon, 13 Oct 2014 18:52:54 -0700
parents 12bb38e187b9
children a04e3c59e117
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
1 <tool id="ucscCancerBrowserStats" description="Statistical Tests of Difference" name="UCSC Cancer Browser Stats" version="0.0.1">
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
2 <description>Apply statistical tests of difference to the rows in a genomic matrix, where the columns are categorized by a second (clinical) matrix</description>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
3 <command interpreter="python">
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
4 stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}"
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
5 </command>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
6 <inputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
7 <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
8 <param format="tabular" name="clinicalFeatures" type="data" label="Clinical Matrix"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
9 <param type="text" name="category1" label="Category 1" optional="false"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
10 <param type="text" name="category2" label="Category 2" optional="false"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
11 </inputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
12 <outputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
13 <data format="tabular" name="outFile" />
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
14 </outputs>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
15 <requirements>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
16 <requirement type="python-module">numpy</requirement>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
17 </requirements>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
18 <tests>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
19 <param name="genomicMatrix" value="sample.genomic.matrix.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
20 <param name="clinicalMatrix" value="sample.clinical.matrix.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
21 <param name="category1" value="A"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
22 <param name="category2" value="B"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
23 <output name="outFile" value="sample.stats.output.txt"/>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
24 </tests>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
25 <help>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
26
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
27 This tool performs statistical tests found in the UCSC Cancer Genomics
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
28 Browser. The input data is a genomic matrix (containing genomic data,
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
29 with rows representing genes or probes and columns representing
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
30 samples or patients), a clinical matrix of two (or more) columns
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
31 assigning categorical values to the samples, and two categorical
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
32 values of interest. The tool identifies the samples corresponding to
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
33 each categorical value, then identifies the columns in the genomic
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
34 matrix corresponding to those sets of samples, which identifies two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
35 groups of columns. For each row in the genomic matrix, it extracts
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
36 the value for those two sets of columns, performs a t-test on the two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
37 sets of values, and returns the result for the row. Any values for
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
38 any columns NOT pertaining to one of the categorical values of
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
39 interest are ignored.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
40
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
41 The user runs this tool with th following steps:
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
42
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
43
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
44 1. Specify a genomic matrix. The expected format is with rows representing
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
45 genes and columns representing samples, and the first line contains sample
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
46 names.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
47
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
48 2. Specify a clinical matrix. Here, rows indicate samples, columns
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
49 indicate clinical features, and the header row contains feature names.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
50 The first column MUST indicate the sample names, and MUST correspond
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
51 to the column names of the genomic matrix. The clinical feature of
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
52 interest MUST be in the second column. Any other columns will be
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
53 ignored.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
54
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
55
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
56 3. Indicate two clinical values that you want to use for defining the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
57 two groups. For example, the two groups could be "Red group" and
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
58 "Green group", 0 and 1, or whatever.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
59
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
60 The output indicates, for each row, the t-statistic reporting on the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
61 difference between the two groups of columns (as specified by the two
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
62 clinical values), the p-value corresponding to that t-statistic, the
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
63 median value for each group, and the difference between the medians. If it
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
64 cannot calculate these values, it returns a vector of NAs.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
65
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
66 For example, given the following genomic matrix for (1)::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
67
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
68 Gene 1 2 3 4 5 6 7 8 9 10
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
69 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
70 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
71
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
72 and given the following clinical matrix for (2)::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
73
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
74 sample_id Value
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
75 1 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
76 2 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
77 3 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
78 4 C
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
79 5 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
80 6 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
81 7 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
82 8 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
83 9 B
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
84 10 A
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
85
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
86 and given A for Category 1 and B for Category 2
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
87
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
88 the tool will assemble the following two groups of values::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
89
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
90 G1 A:(2.0, 2.2, 3.2, 1.1, 0.2) B:(3.2, 5.1, 8.1, 8.1)
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
91 G2 A:(0.1, 8.2, 3.9, 2.3, 0.2) B:(9.1, 6.1, 4.9, 1.1)
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
92
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
93 Note that the values for sample_id 4 do not appear, because it has a Value
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
94 of C in the second column, which is neither A nor B.
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
95
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
96 And it will return the output::
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
97
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
98 Gene Statistic pValue Median1 Median2 Delta
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
99 G1 -4.168999 0.004194 2.000000 6.600000 -4.600000
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
100 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
101
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
102
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
103 </help>
12bb38e187b9 Uploaded, initial check-in
melissacline
parents:
diff changeset
104 </tool>