# HG changeset patch # User ynewton # Date 1349194233 14400 # Node ID 66689170153ed534a25c50565965405acf8479a4 # Parent 6eb6f52c9562181f52c0789b66b4d569ca189b68 Uploaded diff -r 6eb6f52c9562 -r 66689170153e normalize.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/normalize.xml Tue Oct 02 12:10:33 2012 -0400 @@ -0,0 +1,56 @@ + + Matrix Normalize + normalize.r $genomicMatrix $normType $normBy +#if str($normalList) != "None": + $controlColumnLabelsList +#end if + > $outfile + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool takes data in a matrix format and normalizes it using the chosen normalization options. The matrix data is assumed to be column and row annotated, meaning that the first line of the matrix file is assumed to be the column headers and the first column of each row is assumed to be the row header. + +Data can be normalized either by row or column. Note that exponential, normal, and weibull normalizations automatically do so by column regardless of the user selection. + +The following normalizations are provided: + +1. Median shift: if no normals list is provided then computes the median for the whole row and subtracts it from each entry of the row. If normals are provided then computes median for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored. + +2. Mean shift: if no normals list is provided then computes the mean for the whole row and subtracts it from each entry of the row. If normals are provided then computes mean for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored. + +3. T-statistic (z-score): sometimes called standardization. Z-score is computed for each value of the row/column. If normals are specified then the z-score within each class (normals and non-normals) is computed. + +4. Exponential normalization: performed by columns/samples. All genes/probes in the column/sample are ranked. Then inverse CDF (quantile function) is applied to the ranks (transforms a rank to a real number in exponential distribution). + +5. Normal normalization: same as exponential normalization, but inverse quantile function of Normal distribution is applied. + +6. Weibull normalizations: same as exponential normalization, but inverse quantile function of Weibull distribution is applied with appropriate scale and shape parameters. + + +Normals parameter is an optional parameter which contains a list of column headers from the input matrix which should be considered as normals + + + \ No newline at end of file