Mercurial > repos > peter-waltman > ucsc_cluster_tools2

<tool id="hcluster" name="Hierarchical Clustering (HAC)" force_history_refresh="True">
    <command interpreter="python">hclust.py
-d $dataset
${dist_obj}
-n ${direction}
-m ${distance_metric}
-l ${linkage}

#if str($numk) != "-1":
-k ${numk}
#end if

#if str($direction) == "rows":
-o ${rdata_output_rows}
#end if

#if str($direction) == "cols":
-o ${rdata_output_cols}
#end if

</command>
    <inputs>
    	<param name="dataset" type="data" format='tabular' label="Data Set" help="Specify the data matrix (tab-delimited) to be clustered"/>
	<param name="dist_obj" type="boolean" label="Distance Object (R dist object)?" truevalue="-D" falsevalue="" checked="False" help="Check if the matrix contains the pairwise distances between a set of objects"/>
    	<param name="direction" type="select" label="Cluster Samples or Genes?" help="Specify the matrix dimension to cluster (see help below)">
	  <option value="cols">Columns (Samples)</option>
	  <option value="rows" selected='true'>Rows (Genes)</option>
    	</param>

    	<param name="distance_metric" type="select" label="Distance Metric" help="Specify the distance metric to use (see help below)">
	  <option value="cosine" selected='true'>Cosine</option>
	  <option value="abscosine">Absolute Cosine</option>
	  <option value="pearson">Pearson</option>
	  <option value="abspearson">Absolute Pearson</option>
	  <option value="spearman">Spearman</option>
	  <option value="kendall">Kendall</option>
	  <option value="euclidean">Euclidean</option>
	  <option value="maximum">Maximum</option>
	  <option value="manhattan">Manhattan (AKA city block)</option>
	  <option value="canberra">Canberra</option>
	  <option value="binary">Binary</option>
    	</param>

    	<param name="linkage" type="select" label="Linkage" help="Specify the linkage to use when clustering (see help below)">
	  <option value="average">Average</option>
	  <option value="centroid">Centroid</option>
	  <option value="complete" selected='true'>Complete</option>
	  <option value="mcquitty">McQuitty</option>
	  <option value="median">Median</option>
	  <option value="single">Single</option>
	  <option value="ward">Ward</option>
    	</param>

    	<param name="numk" type="integer" label="Number of Clusters" value="-1" help="Specify the number of clusters to use (-1 to use default. See help below)."/>

    </inputs>
    <outputs>
      <data format="rdata" name="rdata_output_rows" label="Hierarchical Clustering Results; Gene Clusters (RData)">
        <filter>(direction)=="rows"</filter>
      </data>
      <data format="rdata" name="rdata_output_cols" label="Hierarchical Clustering Results; Sample Clusters (RData)">
        <filter>(direction)=="cols"</filter>
      </data>
    </outputs>
<help>
.. class:: infomark

**Perform Hierarchical Clustering (Cluster Samples) on a specified data set**

----

**Parameters**

- **Data Set** - Specify the data matrix to be clustered.  Data must be formated as follows:

         * Tab-delimited
         * Use row/column headers

- **Cluster Samples or Genes** - Specify the dimension of the matrix to cluster:

         * Rows (Genes)
         * Columns (Samples)

- **Distance Object** Specify whether or not the data set is a pairwise distance matrix

- **Distance Metric** Specify the distance metric to use.  Choice of:

	 * Cosine (AKA uncentered pearson)
	 * Absolute Cosine (AKA uncentered pearson, absolute value)
         * Pearson (pearson correlation)
	 * Absolute Pearson (pearson correlation, absolute value)
         * Spearman (spearman correlation)
	 * Kendall (Kendall's Tau)
         * Euclidean (euclidean distance)
	 * Maximum
	 * Manhattan (AKA city block)
	 * Canberra
	 * Binary

- **Linkage** Specify the linkage to use when clustering.  Choice of:

         * Average (see documentation for R's hclust function for explanation of choices)
         * Single
         * Complete
         * Median
         * Centroid
         * McQuity
         * Ward

- **Number of Clusters** Specify the number of clusters to use.  If set to -1, default values will be used, with the default set as follows:
        * if samples/columns are being clustered, the **default** is 5.
        * if genes/rows are being clustered, the **default** is set to num_rows/30, e.g. if there are 600 row/genes in the matrix, the default will be 20 clusters.

</help>
</tool>
author	peter-waltman
date	Mon, 11 Mar 2013 16:31:29 -0400
parents	0decf3fd54bc
children