Mercurial > repos > melissacline > ucsc_cancer_utilities

--- a/mergeMutationDatasets.xml	Fri Sep 18 10:24:39 2015 -0700
+++ b/mergeMutationDatasets.xml	Fri Sep 18 11:03:59 2015 -0700
@@ -24,13 +24,23 @@
     <data name="outputC" format="tabular" label="Mutation by Position ${labelForDatasetA}+${labelForDatasetB}"/>
   </outputs>
   <help>
-    ***Merge Xena Positional Mutation Datasets***

-    Output Mutation by Position datafile is ready to be imported into a Xena Hub.
+**Merge Xena Positional Mutation Datasets**
+
+1. Input xena positional mutation data file format: tab-deliminated

-    Output Data Source is of format Rows (Samples) by Columns (identifiers), ready to be imported into a Xena Hub.
+   =======    =====  ======= ===== ========= ====== ========
+   sample     chr    start   end   reference alt    anything
+   =======    =====  ======= ===== ========= ====== ========
+   sample1    chr1   1       1     A         T      0.2
+   sample1    chr1   10      10    T         A      0.1
+   sample2    chr1   20      20    G         GG     0.0
+   sample2    chr1   20      21    GT        G
+   ...        ...    ...     ...   ...       ...    ...
+   =======    =====  ======= ===== ========= ====== ========

-    Given two datasets of mutation data as formatted for the UCSC Xena Browser, merge them to produce a third dataset that is the union of the first two.  The new dataset will contain all mutations from either dataset.
+
+2.  Output file 1: Given two datasets of mutation data, merge them to produce a third dataset that is the union of the first two.  The new dataset will contain all mutations from either dataset.

-    To maintain provenance, this script also outputs a second matrix, with one row for each sample ID that appears in the output dataset, and two columns per row indicating which input dataset(s) contained some mutation data for that sample.  By default, the input dataset name is used to indicate which input file each column came from.  Optionally, the user can specify descriptive labels to be used in place of the dataset names.   </help>
+3.  Output file 2: To maintain provenance, this script also outputs a second data file, with one row for each sample ID that appears in the output dataset, and two columns per row indicating which input dataset(s) contained some mutation data for that sample.  Users can specify descriptive labels to indicate the data source.   </help>
 </tool>
--- a/segToGeneMatrix.xml	Fri Sep 18 10:24:39 2015 -0700
+++ b/segToGeneMatrix.xml	Fri Sep 18 11:03:59 2015 -0700
@@ -31,7 +31,7 @@
 1. Input data file format: tab-deliminated

    =======    =====  ======= ===== ====== ======
-   sanmple    chr    start   end   strand value
+   sample    chr    start   end   strand value
    =======    =====  ======= ===== ====== ======
    sample1    chr1   1       100   .      0.5
    sample2    chr1   101     1000  .      1.5
--- a/segToMatrix.xml	Fri Sep 18 10:24:39 2015 -0700
+++ b/segToMatrix.xml	Fri Sep 18 11:03:59 2015 -0700
@@ -32,7 +32,7 @@
 1. Input data file format: tab-deliminated

    =======    =====  ======= ===== ====== ======
-   sanmple    chr    start   end   strand value
+   sample     chr    start   end   strand value
    =======    =====  ======= ===== ====== ======
    sample1    chr1   1       100   .      0.5
    sample2    chr1   101     1000  .      1.5