Mercurial > repos > jjohnson > cistrome_motif
view seqpos.xml @ 0:c392c4007d5e
Imported from capsule None
author | jjohnson |
---|---|
date | Tue, 14 Oct 2014 10:07:00 -0400 |
parents | |
children |
line wrap: on
line source
<tool name="SeqPos motif tool" id="motif_denovo" version="0.1.0"> <description>Find motifs from given regions enriched near the centers</description> <macros> <import>tool_macros.xml</import> </macros> <!-- cistrome numpy jinja2 --> <expand macro="requirements_seqpos" /> <command> MDSeqPos.py #include source=$ref_genome_seq_opts# #if $search_type != None and len(str($search_type)) > 0: -m $search_type #end if $denovo -v -c --hcluster="$hcluster" -w "$width" #if $maxmotif != None: --maxmotif=$maxmotif #end if -p "$pval" #if $species_list != None and len(str($search_type)) > 0: -s $species_list #end if $bfile $bfile.metadata.dbkey &> $log && cp results/table.html $output_html && mkdir $output_html.extra_files_path && cp -R results/* $output_html.extra_files_path </command> <expand macro="stdio"/> <inputs> <param format="bed" name="bfile" type="data" label="BED file (at most 5K lines.If you have more than 5K lines,please sort them and pick top 5k lines first)" help="Tip: the chromosome in bed file cannot be something like 'chr1_xxxx'. You need to filter them out using the tool 'Filter and Sort -> Select' by 'NOT matching' for the pattern '^chr([0-9A-Za-z])+_'"> <validator type="unspecified_build" /> </param> <expand macro="refGenomeSourceConditional"/> <param name="search_type" type="select" multiple="true" display="checkboxes" force_select="true" optional="false" label="Select which motif database(s) to use"> <option value="cistrome.xml" selected="true">cistrome (Curated)</option> <option value="pbm.xml">pbm</option> <option value="y1h.xml">y1h</option> <option value="transfac.xml">transfac</option> <option value="hpdi.xml">hpdi</option> <option value="jaspar.xml">jaspar</option> </param> <param name="denovo" type="boolean" truevalue="-d" falsevalue="" checked="false" label="Include denovo motif search"/> <param name="species_list" type="select" multiple="true" display="checkboxes" force_select="true" optional="false" label="Select which species to filter the results by (Optional)"> <option value="hs,mm">Homo Sapien or Mus Musculus</option> <option value="ce">Caenorhabditis Elegans</option> <option value="dm">Drosophila Melanogaster</option> </param> <param name="width" type="integer" label="width of region to be scanned" value="600"> <validator type="in_range" max="10000" min="100" message="width is out of range, width has to be between 100 to 10000" /> </param> <param name="pval" type="float" label="p-value cutoff" value="0.001"> <validator type="in_range" max="1" min="0" message="Pvalue is out of range, Pvalue has to be between 0 to 1" /> </param> <param name="maxmotif" type="integer" label="max output hits. (0 means output all fit the pvalue cutoff)" value="0" min="0" optional="true" /> <param name="hcluster" type="text" label="The similarity cutoff for hierarchical clustering of the output (The higher, the more groups, 0 ~ 1)" value="0.8"/> </inputs> <outputs> <data format="xml" name="output_xml" label="SeqPos xml output on ${bfile.name}" from_work_dir="results/denovo.xml"> <filter>denovo == True</filter> </data> <data format="html" name="output_html" label="SeqPos html output on ${bfile.name}"/> <data format="txt" name="log" label="SeqPos Log on ${bfile.name}"/> </outputs> <help> The **SeqPos** tool will find motifs enriched in a set of regions. **SeqPos** use the distances from motif positions to the peak summits ( center of the regions) to find the most enriched motifs near peak summits. **SeqPos** can scan all the motifs in TRANSFAC, Matha's Protein Binding Microarray ( a.k.a PBM) and Scot Wolfe's protein DNA binding database ( y1h). Also **SeqPos** can try to find *de novo* motifs using MDscan algorithm. At last, **SeqPos** can cluster the similar motifs in a cluster tree to help user filter out the redundant motifs. This tool is made by Cliff Meyer and Len Taing. A detail explanation of the algorithm can be found in the supplementary material of the paper "Nucleosome dynamics define transcriptional enhancers." (Nat Genet, 42(4):343-347) The tool was modified then by Jian Ma and Tao Liu. Version: 0.590. About our curated cistrome motif database: This database only includes human and mouse data. It puts data from Transfac, JASPAR, UniPROBE (pbm), hPDI together, also it includes the motifs derived from ChIP-seq data. After that we delete the motifs look similar from each other to keep a clean and smaller database. This database is a recommended one and always in updating. .. class:: infomark **TIP:** Please make sure the regions in your BED file is valid! If the region is out of boundary of chromosome, it will cause error. Also please avoid abnormal chromosome names. .. class:: infomark **TIP:** The running time is increasing with the number of regions. Please avoid using more than 10 thousand regions for input. .. class:: warningmark **NEED IMPROVEMENT** ----- **Parameters** - **BED file** is the input file. It can be the output from peak calling softwares. Please pay attention that the regions in the BED file should not be out of boundary of chromosome. *This file can only contain at most 5000 lines. If not, please filter it using Galaxy:Filter and Sort tool*. - **Genome Assembly version** is the UCSC database version. - **Motif databases** is the known motif collections in Cistrome, including TRANSFAC, PBM and Scot wolfe's database. You can select *de novo motif search* to enable *de novo* motif scan. - **Species list** are the species that you want to filter the results with. Select none of the species to see all of the results. - **Width of regions** is the region to scan for motifs around peak summits ( centers of input regions). - **P-value cutoff** can be used to filter the results. .. class:: infomark **TIP:** To browse the known motif databases, click here_ .. _here: http://cistrome.org/~jian/motif_collection/databases/Cistrome/Cistrome.xml ----- **Output** - **HTML output** can be open in web browser. Users can browse the result in either the middle list view of the page or the bottom cluster tree view, and the detail of motif can be seen in the top detail view. The list view is sortable at every field. The detail view provides two buttons to open the detail information in a separate webpage, or to show the PSSM of the motif. - **XML output** is the XML formated output. - **LOG file** is for job log. If you see errors, please attach this in the bug report. </help> </tool>