view mergeMutationDatasets.xml @ 37:e81019e3ac99

Updated synapseGetDataset to look at the filename rather than the (no longer existant) content type field to determine if the data is in zip format
author melissacline
date Mon, 27 Jul 2015 16:29:24 -0700
parents 3a259686f0fc
children 9806198df91f
line wrap: on
line source

<tool id="mergeMutationDatasets" description="Merge two Xena positional mutation datasets into a new dataset" name="Merge Positional Mutation Data" version="0.0.1">
  <description>
    Given two mutation datasets, merge them to create a larger dataset with the mutations from both datasets. Output this larger dataset, along with a 2-column matrix indicating the source of each mutation
  </description>
  <command interpreter="python">
      mergeXenaMutation.py $outputC $outputSourceMatrix $errorLog  $inputA $inputB 
      #if $labelForDatasetA
          --aLabel "${labelForDatasetA}"
      #end if
      #if $labelForDatasetB
          --bLabel "${labelForDatasetB}"
      #end if
  </command>
  <inputs>
    <param name="inputA" format="tabular" type="data" label="Mutation Dataset A"/>
    <param type="text" name="labelForDatasetA"  label="Dataset A Label (optional)" optional="true"/>
    <param name="inputB" format="tabular" type="data" label="Mutation Dataset B"/> 
    <param type="text" name="labelForDatasetB"  label="Dataset B Label (optional)" optional="true"/>
 </inputs>
  <outputs>
    <data name="errorLog" format="data" label="Execution Log"/>
    <data name="outputSourceMatrix" format="tabular" label="Mutation Data Sources"/> 
    <data name="outputC" format="tabular" label="Merged Mutation Data"/>
  </outputs>
  <help>
    ***Merge Xena Positional Mutation Datasets***

    Given two datasets of mutation data as formatted for the UCSC Xena Browser, merge them to produce a third dataset that is the union of the first two.  The new dataset will contain all mutations from either dataset. 

    To maintain provenance, this script also outputs a second matrix, with one row for each sample ID that appears in the output dataset, and two columns per row indicating which input dataset(s) contained some mutation data for that sample.  By default, the input dataset name is used to indicate which input file each column came from.  Optionally, the user can specify descriptive labels to be used in place of the dataset names.   </help>
</tool>