annotate mergeGenomicFiles.xml @ 60:bf57076e27b9 default tip

change genomicSegment input data
author jingchunzhu@gmail.com
date Tue, 27 Oct 2015 16:07:09 -0700
parents eb5acf81e609
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
41
jingchunzhu
parents: 21
diff changeset
1 <tool id="mergeGenomicFiles" description="Merge two genomic matrices into a new dataset" name="Merge Genomic Matrix Datasets" version="0.0.1">
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
2 <description>
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
3 Given two genomic datasets, merge them to create a larger dataset with the row and column identifiers from both datasets. Output this larger dataset, along with a 2-column matrix indicating the source file of each sample
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
4 </description>
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
5 <command interpreter="python">
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
6 mergeGenomicMatrixFiles.py $inputA $inputB $outputC $outputSourceMatrix
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
7 #if $labelForDatasetA
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
8 --aLabel "${labelForDatasetA}"
41
jingchunzhu
parents: 21
diff changeset
9 #else
jingchunzhu
parents: 21
diff changeset
10 --aLabel "${inputA.name}"
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
11 #end if
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
12 #if $labelForDatasetB
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
13 --bLabel "${labelForDatasetB}"
41
jingchunzhu
parents: 21
diff changeset
14 #else
jingchunzhu
parents: 21
diff changeset
15 --bLabel "${inputB.name}"
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
16 #end if
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
17 </command>
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
18 <inputs>
41
jingchunzhu
parents: 21
diff changeset
19 <param name="inputA" format="tabular" type="data" label="Genomic Matrix A"/>
43
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
20 <param type="text" name="labelForDatasetA" label="Dataset A Label (eg. LGG)" value="A"/>
41
jingchunzhu
parents: 21
diff changeset
21 <param name="inputB" format="tabular" type="data" label="Genomic Matrix B"/>
43
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
22 <param type="text" name="labelForDatasetB" label="Dataset B Label (eg. GBM)" value="B"/>
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
23 </inputs>
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
24 <outputs>
42
jingchunzhu
parents: 41
diff changeset
25 <data name="outputSourceMatrix" format="tabular" label="Data Source ${labelForDatasetA}+${labelForDatasetB}"/>
jingchunzhu
parents: 41
diff changeset
26 <data name="outputC" format="tabular" label="Genomic Matrix ${labelForDatasetA}+${labelForDatasetB}"/>
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
27 </outputs>
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
28 <help>
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
29 ***Merge Genomic Datasets***
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
30
43
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
31 Output Genomic Matrix is of format Rows (Identifiers) by Columns (Samples), ready to be imported into a Xena Hub.
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
32
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
33 Output Data Source is of format Rows (Samples) by Columns (identifiers), ready to be imported into a Xena Hub.
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
34
eb5acf81e609 improve messages
jingchunzhu
parents: 42
diff changeset
35
7
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
36 Given two genomic datasets, merge them to produce a third dataset that is the union of the first two. The new dataset will contain all column labels from either dataset, and all row labels from either dataset. If a row label appears in both datasets, the output dataset will contain, for that row, all values for the first set of columns, plus all values for the second set of columns. If a row label appears in the first dataset only, the output dataset will contain the values for the columns of the first dataset, and blanks (indicating missing values) for the columns of the second dataset.
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
37
1d150e860c4d Expanded the functionality of the merge genomic datasets tool, to generate an output dataset with the file (or label) indicating where each column came from
melissacline
parents: 3
diff changeset
38 To maintain provenance, this script also outputs a second matrix, with one row for each column in the output dataset, and two columns per row indicating which input dataset that column came from. By default, the input dataset name is used to indicate which input file each column came from. Optionally, the user can specify descriptive labels to be used in place of the filenames. This all assumes that each column exists in only one input dataset.
3
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
39 </help>
12a1ea920524 Creating a tool to merge genomic datasets
melissacline
parents:
diff changeset
40 </tool>