11
|
1 <tool id="ucscCancerBrowserStats" description="t-tests of Difference in genomic data" name="Difference between categories (t-test)" version="0.0.1">
|
0
|
2 <command interpreter="python">
|
|
3 stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}"
|
|
4 </command>
|
|
5 <inputs>
|
|
6 <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/>
|
11
|
7 <param format="tabular" name="clinicalFeatures" type="data" label="Phenotype Matrix"/>
|
0
|
8 <param type="text" name="category1" label="Category 1" optional="false"/>
|
|
9 <param type="text" name="category2" label="Category 2" optional="false"/>
|
|
10 </inputs>
|
|
11 <outputs>
|
|
12 <data format="tabular" name="outFile" />
|
|
13 </outputs>
|
|
14 <requirements>
|
9
|
15 <requirement type="package" version="1.0" >cancerBrowserStats</requirement>
|
0
|
16 </requirements>
|
|
17 <tests>
|
11
|
18 <param name="genomicMatrix" value="sample.genomic.matrix.txt" />
|
|
19 <param name="clinicalMatrix" value="sample.clinical.matrix.txt" />
|
0
|
20 <param name="category1" value="A"/>
|
|
21 <param name="category2" value="B"/>
|
|
22 <output name="outFile" value="sample.stats.output.txt"/>
|
|
23 </tests>
|
|
24 <help>
|
|
25
|
|
26 This tool performs statistical tests found in the UCSC Cancer Genomics
|
|
27 Browser. The input data is a genomic matrix (containing genomic data,
|
|
28 with rows representing genes or probes and columns representing
|
|
29 samples or patients), a clinical matrix of two (or more) columns
|
|
30 assigning categorical values to the samples, and two categorical
|
|
31 values of interest. The tool identifies the samples corresponding to
|
|
32 each categorical value, then identifies the columns in the genomic
|
|
33 matrix corresponding to those sets of samples, which identifies two
|
|
34 groups of columns. For each row in the genomic matrix, it extracts
|
|
35 the value for those two sets of columns, performs a t-test on the two
|
|
36 sets of values, and returns the result for the row. Any values for
|
|
37 any columns NOT pertaining to one of the categorical values of
|
|
38 interest are ignored.
|
|
39
|
|
40 The user runs this tool with th following steps:
|
|
41
|
|
42
|
|
43 1. Specify a genomic matrix. The expected format is with rows representing
|
|
44 genes and columns representing samples, and the first line contains sample
|
|
45 names.
|
|
46
|
|
47 2. Specify a clinical matrix. Here, rows indicate samples, columns
|
|
48 indicate clinical features, and the header row contains feature names.
|
|
49 The first column MUST indicate the sample names, and MUST correspond
|
|
50 to the column names of the genomic matrix. The clinical feature of
|
|
51 interest MUST be in the second column. Any other columns will be
|
|
52 ignored.
|
|
53
|
|
54
|
|
55 3. Indicate two clinical values that you want to use for defining the
|
|
56 two groups. For example, the two groups could be "Red group" and
|
|
57 "Green group", 0 and 1, or whatever.
|
|
58
|
|
59 The output indicates, for each row, the t-statistic reporting on the
|
|
60 difference between the two groups of columns (as specified by the two
|
|
61 clinical values), the p-value corresponding to that t-statistic, the
|
|
62 median value for each group, and the difference between the medians. If it
|
|
63 cannot calculate these values, it returns a vector of NAs.
|
|
64
|
|
65 For example, given the following genomic matrix for (1)::
|
|
66
|
|
67 Gene 1 2 3 4 5 6 7 8 9 10
|
|
68 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2
|
|
69 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2
|
|
70
|
|
71 and given the following clinical matrix for (2)::
|
|
72
|
|
73 sample_id Value
|
|
74 1 A
|
|
75 2 A
|
|
76 3 B
|
|
77 4 C
|
|
78 5 B
|
|
79 6 B
|
|
80 7 A
|
|
81 8 A
|
|
82 9 B
|
|
83 10 A
|
|
84
|
|
85 and given A for Category 1 and B for Category 2
|
|
86
|
|
87 the tool will assemble the following two groups of values::
|
|
88
|
|
89 G1 A:(2.0, 2.2, 3.2, 1.1, 0.2) B:(3.2, 5.1, 8.1, 8.1)
|
|
90 G2 A:(0.1, 8.2, 3.9, 2.3, 0.2) B:(9.1, 6.1, 4.9, 1.1)
|
|
91
|
|
92 Note that the values for sample_id 4 do not appear, because it has a Value
|
|
93 of C in the second column, which is neither A nor B.
|
|
94
|
|
95 And it will return the output::
|
|
96
|
|
97 Gene Statistic pValue Median1 Median2 Delta
|
|
98 G1 -4.168999 0.004194 2.000000 6.600000 -4.600000
|
|
99 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000
|
|
100
|
|
101
|
|
102 </help>
|
|
103 </tool>
|