annotate normalize.xml @ 10:0c4dcd3980c1 draft default tip

Uploaded
author ynewton
date Sat, 20 Oct 2012 02:28:54 -0400
parents 277a79e23357
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
10
0c4dcd3980c1 Uploaded
ynewton
parents: 8
diff changeset
1 <tool id="matrix_normalize" name="Matrix Normalize" version="2.0.0">
8
277a79e23357 Uploaded
ynewton
parents:
diff changeset
2 <description>Matrix Normalize</description>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
3 <command interpreter="Rscript">normalize.r $genomicMatrix $normType $normBy
277a79e23357 Uploaded
ynewton
parents:
diff changeset
4 #if str($controlColumnLabelsList) != "None":
277a79e23357 Uploaded
ynewton
parents:
diff changeset
5 $controlColumnLabelsList
277a79e23357 Uploaded
ynewton
parents:
diff changeset
6 #end if
277a79e23357 Uploaded
ynewton
parents:
diff changeset
7 > $outfile
277a79e23357 Uploaded
ynewton
parents:
diff changeset
8 </command>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
9 <inputs>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
10 <param name="genomicMatrix" type="data" label="Genomic Matrix"/>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
11 <param name="normBy" type="select" label="normalize by (row or column)">
277a79e23357 Uploaded
ynewton
parents:
diff changeset
12 <option value="row">ROW</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
13 <option value="column">COLUMN</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
14 </param>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
15 <param name="normType" type="select" label="type of normalization">
277a79e23357 Uploaded
ynewton
parents:
diff changeset
16 <option value="median_shift">Median Shift</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
17 <option value="mean_shift">Mean Shift</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
18 <option value="t_statistic">Student t-statistic (z-scores)</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
19 <option value="exponential_fit">Exponential Distribution Normalization</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
20 <option value="normal_fit">Normal Distribution Normalization</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
21 <option value="weibull_0.5_fit">Weibull Distribution Normalization (scale=1,shape=0.5)</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
22 <option value="weibull_1_fit">Weibull Distribution Normalization (scale=1,shape=1)</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
23 <option value="weibull_1.5_fit">Weibull Distribution Normalization (scale=1,shape=1.5)</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
24 <option value="weibull_5_fit">Weibull Distribution Normalization (scale=1,shape=5)</option>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
25 </param>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
26 <param name="controlColumnLabelsList" optional="true" type="data" label="Controls"/>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
27 </inputs>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
28 <outputs>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
29 <data name="outfile" format="tabular"/>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
30 </outputs>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
31 <help>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
32 **What it does**
277a79e23357 Uploaded
ynewton
parents:
diff changeset
33
277a79e23357 Uploaded
ynewton
parents:
diff changeset
34 This tool takes data in a matrix format and normalizes it using the chosen normalization options. The matrix data is assumed to be column and row annotated, meaning that the first line of the matrix file is assumed to be the column headers and the first column of each row is assumed to be the row header.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
35
277a79e23357 Uploaded
ynewton
parents:
diff changeset
36 Data can be normalized either by row or column. Note that exponential, normal, and weibull normalizations automatically do so by column regardless of the user selection.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
37
277a79e23357 Uploaded
ynewton
parents:
diff changeset
38 The following normalizations are provided:
277a79e23357 Uploaded
ynewton
parents:
diff changeset
39
277a79e23357 Uploaded
ynewton
parents:
diff changeset
40 1. Median shift: if no normals list is provided then computes the median for the whole row and subtracts it from each entry of the row. If normals are provided then computes median for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
41
277a79e23357 Uploaded
ynewton
parents:
diff changeset
42 2. Mean shift: if no normals list is provided then computes the mean for the whole row and subtracts it from each entry of the row. If normals are provided then computes mean for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
43
277a79e23357 Uploaded
ynewton
parents:
diff changeset
44 3. T-statistic (z-score): sometimes called standardization. Z-score is computed for each value of the row/column. If normals are specified then the z-score within each class (normals and non-normals) is computed.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
45
277a79e23357 Uploaded
ynewton
parents:
diff changeset
46 4. Exponential normalization: performed by columns/samples. All genes/probes in the column/sample are ranked. Then inverse CDF (quantile function) is applied to the ranks (transforms a rank to a real number in exponential distribution).
277a79e23357 Uploaded
ynewton
parents:
diff changeset
47
277a79e23357 Uploaded
ynewton
parents:
diff changeset
48 5. Normal normalization: same as exponential normalization, but inverse quantile function of Normal distribution is applied.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
49
277a79e23357 Uploaded
ynewton
parents:
diff changeset
50 6. Weibull normalizations: same as exponential normalization, but inverse quantile function of Weibull distribution is applied with appropriate scale and shape parameters.
277a79e23357 Uploaded
ynewton
parents:
diff changeset
51
277a79e23357 Uploaded
ynewton
parents:
diff changeset
52
10
0c4dcd3980c1 Uploaded
ynewton
parents: 8
diff changeset
53 Normals/controls parameter is an optional parameter which contains either a list of column headers from the input matrix which should be considered as normals/controls, or a matrix of normal/control samples. The program is smart enough to distinguish between the two cases and will automatically process the normals/controls in a correct way. When specifying both the main expression matrix and the normals/controls matrix while performing column-wise normalization, the program will actually concatenate the two matrices and produce a combined matrix which contains both tumor and normal/control samples, in which samples are normalized.
8
277a79e23357 Uploaded
ynewton
parents:
diff changeset
54
277a79e23357 Uploaded
ynewton
parents:
diff changeset
55 </help>
277a79e23357 Uploaded
ynewton
parents:
diff changeset
56 </tool>