Mercurial > repos > melissacline > ucsc_cancer_browser_stats
comparison ttest/stats.xml @ 0:12bb38e187b9
Uploaded, initial check-in
author | melissacline |
---|---|
date | Mon, 28 Jul 2014 20:12:17 -0400 |
parents | |
children | a04e3c59e117 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:12bb38e187b9 |
---|---|
1 <tool id="ucscCancerBrowserStats" description="Statistical Tests of Difference" name="UCSC Cancer Browser Stats" version="0.0.1"> | |
2 <description>Apply statistical tests of difference to the rows in a genomic matrix, where the columns are categorized by a second (clinical) matrix</description> | |
3 <command interpreter="python"> | |
4 stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}" | |
5 </command> | |
6 <inputs> | |
7 <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/> | |
8 <param format="tabular" name="clinicalFeatures" type="data" label="Clinical Matrix"/> | |
9 <param type="text" name="category1" label="Category 1" optional="false"/> | |
10 <param type="text" name="category2" label="Category 2" optional="false"/> | |
11 </inputs> | |
12 <outputs> | |
13 <data format="tabular" name="outFile" /> | |
14 </outputs> | |
15 <requirements> | |
16 <requirement type="python-module">numpy</requirement> | |
17 </requirements> | |
18 <tests> | |
19 <param name="genomicMatrix" value="sample.genomic.matrix.txt"/> | |
20 <param name="clinicalMatrix" value="sample.clinical.matrix.txt"/> | |
21 <param name="category1" value="A"/> | |
22 <param name="category2" value="B"/> | |
23 <output name="outFile" value="sample.stats.output.txt"/> | |
24 </tests> | |
25 <help> | |
26 | |
27 This tool performs statistical tests found in the UCSC Cancer Genomics | |
28 Browser. The input data is a genomic matrix (containing genomic data, | |
29 with rows representing genes or probes and columns representing | |
30 samples or patients), a clinical matrix of two (or more) columns | |
31 assigning categorical values to the samples, and two categorical | |
32 values of interest. The tool identifies the samples corresponding to | |
33 each categorical value, then identifies the columns in the genomic | |
34 matrix corresponding to those sets of samples, which identifies two | |
35 groups of columns. For each row in the genomic matrix, it extracts | |
36 the value for those two sets of columns, performs a t-test on the two | |
37 sets of values, and returns the result for the row. Any values for | |
38 any columns NOT pertaining to one of the categorical values of | |
39 interest are ignored. | |
40 | |
41 The user runs this tool with th following steps: | |
42 | |
43 | |
44 1. Specify a genomic matrix. The expected format is with rows representing | |
45 genes and columns representing samples, and the first line contains sample | |
46 names. | |
47 | |
48 2. Specify a clinical matrix. Here, rows indicate samples, columns | |
49 indicate clinical features, and the header row contains feature names. | |
50 The first column MUST indicate the sample names, and MUST correspond | |
51 to the column names of the genomic matrix. The clinical feature of | |
52 interest MUST be in the second column. Any other columns will be | |
53 ignored. | |
54 | |
55 | |
56 3. Indicate two clinical values that you want to use for defining the | |
57 two groups. For example, the two groups could be "Red group" and | |
58 "Green group", 0 and 1, or whatever. | |
59 | |
60 The output indicates, for each row, the t-statistic reporting on the | |
61 difference between the two groups of columns (as specified by the two | |
62 clinical values), the p-value corresponding to that t-statistic, the | |
63 median value for each group, and the difference between the medians. If it | |
64 cannot calculate these values, it returns a vector of NAs. | |
65 | |
66 For example, given the following genomic matrix for (1):: | |
67 | |
68 Gene 1 2 3 4 5 6 7 8 9 10 | |
69 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2 | |
70 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2 | |
71 | |
72 and given the following clinical matrix for (2):: | |
73 | |
74 sample_id Value | |
75 1 A | |
76 2 A | |
77 3 B | |
78 4 C | |
79 5 B | |
80 6 B | |
81 7 A | |
82 8 A | |
83 9 B | |
84 10 A | |
85 | |
86 and given A for Category 1 and B for Category 2 | |
87 | |
88 the tool will assemble the following two groups of values:: | |
89 | |
90 G1 A:(2.0, 2.2, 3.2, 1.1, 0.2) B:(3.2, 5.1, 8.1, 8.1) | |
91 G2 A:(0.1, 8.2, 3.9, 2.3, 0.2) B:(9.1, 6.1, 4.9, 1.1) | |
92 | |
93 Note that the values for sample_id 4 do not appear, because it has a Value | |
94 of C in the second column, which is neither A nor B. | |
95 | |
96 And it will return the output:: | |
97 | |
98 Gene Statistic pValue Median1 Median2 Delta | |
99 G1 -4.168999 0.004194 2.000000 6.600000 -4.600000 | |
100 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000 | |
101 | |
102 | |
103 </help> | |
104 </tool> |