view seqpos.xml @ 0:c392c4007d5e

Imported from capsule None
author jjohnson
date Tue, 14 Oct 2014 10:07:00 -0400
parents
children
line wrap: on
line source

<tool name="SeqPos motif tool" id="motif_denovo" version="0.1.0">
  <description>Find motifs from given regions enriched near the centers</description>
  <macros>
    <import>tool_macros.xml</import>
  </macros>
  <!-- cistrome numpy jinja2 -->
  <expand macro="requirements_seqpos" />
  <command>
MDSeqPos.py 
#include source=$ref_genome_seq_opts#
#if $search_type != None and len(str($search_type)) > 0:
  -m $search_type
#end if
$denovo
-v -c --hcluster="$hcluster" -w "$width" 
#if $maxmotif != None:
 --maxmotif=$maxmotif 
#end if
-p "$pval"
#if $species_list != None and  len(str($search_type)) > 0:
  -s $species_list
#end if
$bfile $bfile.metadata.dbkey &amp;> $log &amp;&amp; 
cp results/table.html $output_html  &amp;&amp;
mkdir $output_html.extra_files_path  &amp;&amp;
cp -R results/* $output_html.extra_files_path
  </command>
  <expand macro="stdio"/>
  <inputs>
      <param format="bed" name="bfile" type="data" label="BED file (at most 5K lines.If you have more than 5K lines,please sort them and pick top 5k lines first)" help="Tip: the chromosome in bed file cannot be something like 'chr1_xxxx'. You need to filter them out using the tool 'Filter and Sort -> Select' by 'NOT matching' for the pattern '^chr([0-9A-Za-z])+_'">
        <validator type="unspecified_build" />
      </param>
      
      <expand macro="refGenomeSourceConditional"/>

      <param name="search_type" type="select" multiple="true" display="checkboxes" force_select="true" optional="false" label="Select which motif database(s) to use">
        <option value="cistrome.xml" selected="true">cistrome (Curated)</option>
        <option value="pbm.xml">pbm</option>
        <option value="y1h.xml">y1h</option>
        <option value="transfac.xml">transfac</option>
        <option value="hpdi.xml">hpdi</option>
        <option value="jaspar.xml">jaspar</option>
      </param>
      <param name="denovo" type="boolean" truevalue="-d" falsevalue="" checked="false" label="Include denovo motif search"/>

      <param name="species_list" type="select" multiple="true" display="checkboxes" force_select="true" optional="false" label="Select which species to filter the results by (Optional)">
          <option value="hs,mm">Homo Sapien or Mus Musculus</option>
          <option value="ce">Caenorhabditis Elegans</option>
          <option value="dm">Drosophila Melanogaster</option>
      </param>
      <param name="width" type="integer" label="width of region to be scanned" value="600">
      	<validator type="in_range" max="10000" min="100" message="width is out of range, width has to be between 100 to 10000" />
      </param>
      <param name="pval" type="float" label="p-value cutoff" value="0.001">
        <validator type="in_range" max="1" min="0" message="Pvalue is out of range, Pvalue has to be between 0 to 1" />
      </param>
      <param name="maxmotif" type="integer" label="max output hits. (0 means output all fit the pvalue cutoff)" value="0" min="0" optional="true" />
      <param name="hcluster" type="text" label="The similarity cutoff for hierarchical clustering of the output (The higher, the more groups, 0 ~ 1)" value="0.8"/>
  </inputs>
  <outputs>
      <data format="xml" name="output_xml" label="SeqPos xml output on ${bfile.name}" from_work_dir="results/denovo.xml">
        <filter>denovo == True</filter>
      </data>
      <data format="html" name="output_html" label="SeqPos html output on ${bfile.name}"/>
      <data format="txt" name="log" label="SeqPos Log on ${bfile.name}"/>
  </outputs>
  <help>
The **SeqPos** tool will find motifs enriched in a set of
regions. **SeqPos** use the distances from motif positions to the peak
summits ( center of the regions) to find the most enriched motifs near
peak summits. **SeqPos** can scan all the motifs in TRANSFAC, Matha's
Protein Binding Microarray ( a.k.a PBM) and Scot Wolfe's protein DNA
binding database ( y1h). Also **SeqPos** can try to find *de novo*
motifs using MDscan algorithm. At last, **SeqPos** can cluster the
similar motifs in a cluster tree to help user filter out the redundant
motifs. This tool is made by Cliff Meyer and Len Taing. A detail
explanation of the algorithm can be found in the supplementary
material of the paper "Nucleosome dynamics define transcriptional
enhancers." (Nat Genet, 42(4):343-347) The tool was modified then by
Jian Ma and Tao Liu. Version: 0.590.

About our curated cistrome motif database: This database only 
includes human and mouse data. It puts data from Transfac, 
JASPAR, UniPROBE (pbm), hPDI together, also it includes the motifs derived 
from ChIP-seq data. After that we delete the motifs look similar from 
each other to keep a clean and smaller database. This database is a 
recommended one and always in updating.

.. class:: infomark

**TIP:** Please make sure the regions in your BED file is valid! If
the region is out of boundary of chromosome, it will cause error. Also
please avoid abnormal chromosome names.

.. class:: infomark

**TIP:** The running time is increasing with the number of
regions. Please avoid using more than 10 thousand regions for input.

.. class:: warningmark

**NEED IMPROVEMENT**

-----

**Parameters**

- **BED file** is the input file. It can be the output from peak
  calling softwares. Please pay attention that the regions in the BED
  file should not be out of boundary of chromosome. 
  *This file can only contain at most 5000 lines. If not, please 
  filter it using Galaxy:Filter and Sort tool*.

- **Genome Assembly version** is the UCSC database version.
- **Motif databases** is the known motif collections in Cistrome,
  including TRANSFAC, PBM and Scot wolfe's database. You can select
  *de novo motif search* to enable *de novo* motif scan.
- **Species list** are the species that you want to filter the results
  with.  Select none of the species to see all of the results.
- **Width of regions** is the region to scan for motifs around peak
  summits ( centers of input regions).
- **P-value cutoff** can be used to filter the results.

.. class:: infomark

**TIP:** To browse the known motif databases, click here_

.. _here: http://cistrome.org/~jian/motif_collection/databases/Cistrome/Cistrome.xml

-----

**Output**

- **HTML output** can be open in web browser. Users can browse the
  result in either the middle list view of the page or the bottom
  cluster tree view, and the detail of motif can be seen in the top
  detail view. The list view is sortable at every field. The detail
  view provides two buttons to open the detail information in a
  separate webpage, or to show the PSSM of the motif.
- **XML output** is the XML formated output.
- **LOG file** is for job log. If you see errors, please attach this
  in the bug report.

  </help>

</tool>