Galaxy |

Data Normalization (version 0.1.0)

Dataset Name:

Cel files dataset:

Cel files dataset previously uploaded with the Multiple File Datasets tool.

cdf file:

.cdf file name must comply with the following format : < chiptype >,< tag >.cdf (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full.cdf).

ufl file:

.ufl file name must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full,na31,hg19,HB20110328.ufl).

ugp file:

.ugp file name must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full,na31,hg19,HB20110328.ugp).

acs file:

.acs file name must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,HB20080710.acs).

Reference:

Output figures:

Output log:

What it does

This preprocessing step consists in a correction of biological and technical biaises due to the experiment. Raw data from Affymetrix arrays are provided in different CEL files. These data must be normalized before statistical analysis. The pre-processing is proposed as a wrapper of aroma.* packages (using CRMAv2 and TumorBoost when appropriate). Note that this implies that the pre-processing step is only available for Affymetrix arrays.

Chip file naming conventions

Chip filenames must strictly follow the following rules :

.cdf filename must comply with the following format : < chiptype >,< tag >.cdf (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full.cdf). Note the use of a comma (not a point) between <chiptype> and the tag "Full".
.ufl filename must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full,na31,hg19,HB20110328.ufl).
.ugp filename must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,Full,na31,hg19,HB20110328.ugp).
.acs file name must start with < chiptype >,< tag > (e.g, for a GenomeWideSNP_6 chip: GenomeWideSNP_6,HB20080710.acs).

Normal-tumor study with TumorBoost

In cases where normal (control) samples match to tumor samples, normalization can be improved using TumorBoost. In this case, a normal-tumor csv file must be provided :

The first column contains the names of the files corresponding to normal samples of the dataset.

The second column contains the names of the tumor samples files.

Column names of these two columns are respectively normal and tumor.

Columns are separated by a comma.

Extensions of the files (.CEL for example) should be removed

Example

Let 6 .cel files in the dataset studied (3 patients, each of them being represented by a couple of normal and tumor cel files.)

patient1_normal.cel
patient1_tumor.cel
patient2_normal.cel
patient2_tumor.cel
patient3_normal.cel
patient3_tumor.cel

The csv file should look like this

normal,tumor
patient1_normal,patient1_tumor
patient2_normal,patient2_tumor
patient3_normal,patient3_tumor

Citation

When using this tool, please cite :

Q. Grimonprez, A. Celisse, M. Cheok, M. Figeac, and G. Marot. MPAgenomics : An R package for multi-patients analysis of genomic markers, 2014. Preprint

As CRMAv2 normalization is used, please also cite H. Bengtsson, P. Wirapati, and T. P. Speed. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics, 5(17):2149–2156, 2009.

When using TumorBoost to improve normalization in a normal-tumor study, please cite H. Bengtsson, P. Neuvial, and T. P. Speed. TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics, 11, 2010