view ttest/stats.xml @ 12:fd8529cd1564 default tip

better t-test
author jingchunzhu
date Mon, 28 Sep 2015 12:36:12 -0700
parents cd4c13ae11ce
children
line wrap: on
line source

<tool id="ucscCancerBrowserStats" description="t-tests of difference in genomic data" name="Difference between categories (t-test)" version="0.0.1">
  <command interpreter="python">
    stats.py  $genomicMatrix $clinicalFeatures $outFile -a="${category1}" -b="${category2}"
  </command>
  <inputs>
    <param format="tabular" name="genomicMatrix" type="data" label="Genomic Matrix"/>
    <param format="tabular" name="clinicalFeatures" type="data" label="Phenotype Matrix"/>
    <param type="text" name="category1" label="Category 1" optional="false"/>
    <param type="text" name="category2" label="Category 2" optional="false"/>
  </inputs>
  <outputs>
    <data format="tabular" name="outFile" />
  </outputs>
  <requirements>
    <requirement type="package" version="1.0" >cancerBrowserStats</requirement>
  </requirements>
  <tests>
    <param name="genomicMatrix" value="sample.genomic.matrix.txt" />
    <param name="clinicalMatrix" value="sample.clinical.matrix.txt" />
    <param name="category1" value="A"/>
    <param name="category2" value="B"/>
    <output name="outFile" value="sample.stats.output.txt"/>
  </tests>
  <help>

This tool performs t-test on genomic data between two groups of samples, which can be used to identify for example, differentially expressed genes or probes.  The genomic data is in the format of UCSC Xena genomic matrix (a tab-deliminated matrix) with rows representing genes or probes and columns representing samples. The phenotype matrix assigns samples into groups. The tool compares two groups of samples, and computes the t-statistics, p value, and delta of medians for each probe/gene between the two groups. The result can be downloaded to programs such as EXCEL for sorting based on the t-statistics. 

The user runs this tool with the following steps:

1. Specify a genomic matrix.  The expected format is with rows representing genes and columns representing samples, and the first line contains sample names. Matrix can be obtained from UCSC Xena bulk download. See below for an example.


2. Specify a phenotype matrix.  Here, rows indicate samples, columns indicate phenotypes or annotations.  Matrix can be obtained from UCSC Xena heatmap download. See below for an example.


3. Specify the two categorical values that you want to use for defining the two groups.  For example, the two groups could be A and B, 0 and 1, etc.


4. The output is, for each probe/gene (in each row), the t-statistics, the p-value, the median value for each group, and the difference between the medians.  If it cannot calculate these values, it returns a vector of NAs.


**Input genomic matrix**::

    Gene  s1   s2   s3   s4   s5   s6   s7   s8   s9   s10
    G1    2.0  2.2  3.2  1.1  5.1  8.1  3.2  1.1  8.1  0.2
    G2    0.1  8.2  9.1  4.2  6.1  4.9  3.9  2.3  1.1  0.2

**Input phenotyp matrix**::

    sample_id  Value
    s1         A
    s2         A
    s3         B
    s4         C
    s5         B
    s6         B
    s7         A
    s8         A
    s9         B
    s10        A
    
**Category 1 : A**

**Category 2 : B**

**Output**::

    Gene Statistic  pValue    Median1   Median2   Delta
    G1   -4.168999  0.004194  2.000000  6.600000  -4.600000
    G2   -1.198486  0.269724  2.300000  5.500000  -3.200000


  </help>
</tool>