Mercurial > repos > melissacline > ucsc_cancer_browser_stats
view ttest/stats.xml @ 12:fd8529cd1564 default tip
better t-test
author | jingchunzhu |
---|---|
date | Mon, 28 Sep 2015 12:36:12 -0700 |
parents | cd4c13ae11ce |
children |
line wrap: on
line source
<tool id="ucscCancerBrowserStats" description="t-tests of difference in genomic data" name="Difference between categories (t-test)" version="0.0.1"> <command interpreter="python"> stats.py $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}" </command> <inputs> <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/> <param format="tabular" name="clinicalFeatures" type="data" label="Phenotype Matrix"/> <param type="text" name="category1" label="Category 1" optional="false"/> <param type="text" name="category2" label="Category 2" optional="false"/> </inputs> <outputs> <data format="tabular" name="outFile" /> </outputs> <requirements> <requirement type="package" version="1.0" >cancerBrowserStats</requirement> </requirements> <tests> <param name="genomicMatrix" value="sample.genomic.matrix.txt" /> <param name="clinicalMatrix" value="sample.clinical.matrix.txt" /> <param name="category1" value="A"/> <param name="category2" value="B"/> <output name="outFile" value="sample.stats.output.txt"/> </tests> <help> This tool performs t-test on genomic data between two groups of samples, which can be used to identify for example, differentially expressed genes or probes. The genomic data is in the format of UCSC Xena genomic matrix (a tab-deliminated matrix) with rows representing genes or probes and columns representing samples. The phenotype matrix assigns samples into groups. The tool compares two groups of samples, and computes the t-statistics, p value, and delta of medians for each probe/gene between the two groups. The result can be downloaded to programs such as EXCEL for sorting based on the t-statistics. The user runs this tool with the following steps: 1. Specify a genomic matrix. The expected format is with rows representing genes and columns representing samples, and the first line contains sample names. Matrix can be obtained from UCSC Xena bulk download. See below for an example. 2. Specify a phenotype matrix. Here, rows indicate samples, columns indicate phenotypes or annotations. Matrix can be obtained from UCSC Xena heatmap download. See below for an example. 3. Specify the two categorical values that you want to use for defining the two groups. For example, the two groups could be A and B, 0 and 1, etc. 4. The output is, for each probe/gene (in each row), the t-statistics, the p-value, the median value for each group, and the difference between the medians. If it cannot calculate these values, it returns a vector of NAs. **Input genomic matrix**:: Gene s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 G1 2.0 2.2 3.2 1.1 5.1 8.1 3.2 1.1 8.1 0.2 G2 0.1 8.2 9.1 4.2 6.1 4.9 3.9 2.3 1.1 0.2 **Input phenotyp matrix**:: sample_id Value s1 A s2 A s3 B s4 C s5 B s6 B s7 A s8 A s9 B s10 A **Category 1 : A** **Category 2 : B** **Output**:: Gene Statistic pValue Median1 Median2 Delta G1 -4.168999 0.004194 2.000000 6.600000 -4.600000 G2 -1.198486 0.269724 2.300000 5.500000 -3.200000 </help> </tool>