What it does
This preprocessing step consists in a correction of biological and technical biaises due to the experiment. Raw data from Affymetrix arrays are provided in different CEL files. These data must be normalized before statistical analysis. The pre-processing is proposed as a wrapper of aroma.* packages (using CRMAv2 and TumorBoost when appropriate). Note that this implies that the pre-processing step is only available for Affymetrix arrays.
Chip file naming conventions
Chip filenames must strictly follow the following rules :
Normal-tumor study with TumorBoost
In cases where normal (control) samples match to tumor samples, normalization can be improved using TumorBoost. In this case, a normal-tumor csv file must be provided :
- The first column contains the names of the files corresponding to normal samples of the dataset.
- The second column contains the names of the tumor samples files.
- Column names of these two columns are respectively normal and tumor.
- Columns are separated by a comma.
- Extensions of the files (.CEL for example) should be removed
Example
Let 6 .cel files in the dataset studied (3 patients, each of them being represented by a couple of normal and tumor cel files.)
patient1_normal.cel patient1_tumor.cel patient2_normal.cel patient2_tumor.cel patient3_normal.cel patient3_tumor.cel
The csv file should look like this
normal,tumor patient1_normal,patient1_tumor patient2_normal,patient2_tumor patient3_normal,patient3_tumor
Citation
When using this tool, please cite :
As CRMAv2 normalization is used, please also cite H. Bengtsson, P. Wirapati, and T. P. Speed. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics, 5(17):2149–2156, 2009.
When using TumorBoost to improve normalization in a normal-tumor study, please cite H. Bengtsson, P. Neuvial, and T. P. Speed. TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics, 11, 2010