comparison ttest/stats.xml @ 0:12bb38e187b9

Uploaded, initial check-in
author melissacline
date Mon, 28 Jul 2014 20:12:17 -0400
parents
children a04e3c59e117
comparison
equal deleted inserted replaced
-1:000000000000 0:12bb38e187b9
1 <tool id="ucscCancerBrowserStats" description="Statistical Tests of Difference" name="UCSC Cancer Browser Stats" version="0.0.1">
2 <description>Apply statistical tests of difference to the rows in a genomic matrix, where the columns are categorized by a second (clinical) matrix</description>
3 <command interpreter="python">
4 stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}"
5 </command>
6 <inputs>
7 <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/>
8 <param format="tabular" name="clinicalFeatures" type="data" label="Clinical Matrix"/>
9 <param type="text" name="category1" label="Category 1" optional="false"/>
10 <param type="text" name="category2" label="Category 2" optional="false"/>
11 </inputs>
12 <outputs>
13 <data format="tabular" name="outFile" />
14 </outputs>
15 <requirements>
16 <requirement type="python-module">numpy</requirement>
17 </requirements>
18 <tests>
19 <param name="genomicMatrix" value="sample.genomic.matrix.txt"/>
20 <param name="clinicalMatrix" value="sample.clinical.matrix.txt"/>
21 <param name="category1" value="A"/>
22 <param name="category2" value="B"/>
23 <output name="outFile" value="sample.stats.output.txt"/>
24 </tests>
25 <help>
26
27 This tool performs statistical tests found in the UCSC Cancer Genomics
28 Browser. The input data is a genomic matrix (containing genomic data,
29 with rows representing genes or probes and columns representing
30 samples or patients), a clinical matrix of two (or more) columns
31 assigning categorical values to the samples, and two categorical
32 values of interest. The tool identifies the samples corresponding to
33 each categorical value, then identifies the columns in the genomic
34 matrix corresponding to those sets of samples, which identifies two
35 groups of columns. For each row in the genomic matrix, it extracts
36 the value for those two sets of columns, performs a t-test on the two
37 sets of values, and returns the result for the row. Any values for
38 any columns NOT pertaining to one of the categorical values of
39 interest are ignored.
40
41 The user runs this tool with th following steps:
42
43
44 1. Specify a genomic matrix. The expected format is with rows representing
45 genes and columns representing samples, and the first line contains sample
46 names.
47
48 2. Specify a clinical matrix. Here, rows indicate samples, columns
49 indicate clinical features, and the header row contains feature names.
50 The first column MUST indicate the sample names, and MUST correspond
51 to the column names of the genomic matrix. The clinical feature of
52 interest MUST be in the second column. Any other columns will be
53 ignored.
54
55
56 3. Indicate two clinical values that you want to use for defining the
57 two groups. For example, the two groups could be "Red group" and
58 "Green group", 0 and 1, or whatever.
59
60 The output indicates, for each row, the t-statistic reporting on the
61 difference between the two groups of columns (as specified by the two
62 clinical values), the p-value corresponding to that t-statistic, the
63 median value for each group, and the difference between the medians. If it
64 cannot calculate these values, it returns a vector of NAs.
65
66 For example, given the following genomic matrix for (1)::
67
68 Gene 1 2 3 4 5 6 7 8 9 10
69 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2
70 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2
71
72 and given the following clinical matrix for (2)::
73
74 sample_id Value
75 1 A
76 2 A
77 3 B
78 4 C
79 5 B
80 6 B
81 7 A
82 8 A
83 9 B
84 10 A
85
86 and given A for Category 1 and B for Category 2
87
88 the tool will assemble the following two groups of values::
89
90 G1 A:(2.0, 2.2, 3.2, 1.1, 0.2) B:(3.2, 5.1, 8.1, 8.1)
91 G2 A:(0.1, 8.2, 3.9, 2.3, 0.2) B:(9.1, 6.1, 4.9, 1.1)
92
93 Note that the values for sample_id 4 do not appear, because it has a Value
94 of C in the second column, which is neither A nor B.
95
96 And it will return the output::
97
98 Gene Statistic pValue Median1 Median2 Delta
99 G1 -4.168999 0.004194 2.000000 6.600000 -4.600000
100 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000
101
102
103 </help>
104 </tool>