Mercurial > repos > melissacline > ucsc_cancer_utilities
annotate mergeGenomicFiles.xml @ 50:b6f5d2d1b047
fix
author | jingchunzhu |
---|---|
date | Tue, 25 Aug 2015 23:42:17 -0700 |
parents | eb5acf81e609 |
children |
rev | line source |
---|---|
41 | 1 <tool id="mergeGenomicFiles" description="Merge two genomic matrices into a new dataset" name="Merge Genomic Matrix Datasets" version="0.0.1"> |
3 | 2 <description> |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
3 Given two genomic datasets, merge them to create a larger dataset with the row and column identifiers from both datasets. Output this larger dataset, along with a 2-column matrix indicating the source file of each sample |
3 | 4 </description> |
5 <command interpreter="python"> | |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
6 mergeGenomicMatrixFiles.py $inputA $inputB $outputC $outputSourceMatrix |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
7 #if $labelForDatasetA |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
8 --aLabel "${labelForDatasetA}" |
41 | 9 #else |
10 --aLabel "${inputA.name}" | |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
11 #end if |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
12 #if $labelForDatasetB |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
13 --bLabel "${labelForDatasetB}" |
41 | 14 #else |
15 --bLabel "${inputB.name}" | |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
16 #end if |
3 | 17 </command> |
18 <inputs> | |
41 | 19 <param name="inputA" format="tabular" type="data" label="Genomic Matrix A"/> |
43 | 20 <param type="text" name="labelForDatasetA" label="Dataset A Label (eg. LGG)" value="A"/> |
41 | 21 <param name="inputB" format="tabular" type="data" label="Genomic Matrix B"/> |
43 | 22 <param type="text" name="labelForDatasetB" label="Dataset B Label (eg. GBM)" value="B"/> |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
23 </inputs> |
3 | 24 <outputs> |
42 | 25 <data name="outputSourceMatrix" format="tabular" label="Data Source ${labelForDatasetA}+${labelForDatasetB}"/> |
26 <data name="outputC" format="tabular" label="Genomic Matrix ${labelForDatasetA}+${labelForDatasetB}"/> | |
3 | 27 </outputs> |
28 <help> | |
29 ***Merge Genomic Datasets*** | |
30 | |
43 | 31 Output Genomic Matrix is of format Rows (Identifiers) by Columns (Samples), ready to be imported into a Xena Hub. |
32 | |
33 Output Data Source is of format Rows (Samples) by Columns (identifiers), ready to be imported into a Xena Hub. | |
34 | |
35 | |
7
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
36 Given two genomic datasets, merge them to produce a third dataset that is the union of the first two. The new dataset will contain all column labels from either dataset, and all row labels from either dataset. If a row label appears in both datasets, the output dataset will contain, for that row, all values for the first set of columns, plus all values for the second set of columns. If a row label appears in the first dataset only, the output dataset will contain the values for the columns of the first dataset, and blanks (indicating missing values) for the columns of the second dataset. |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
37 |
1d150e860c4d
Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents:
3
diff
changeset
|
38 To maintain provenance, this script also outputs a second matrix, with one row for each column in the output dataset, and two columns per row indicating which input dataset that column came from. By default, the input dataset name is used to indicate which input file each column came from. Optionally, the user can specify descriptive labels to be used in place of the filenames. This all assumes that each column exists in only one input dataset. |
3 | 39 </help> |
40 </tool> |