# HG changeset patch # User jjohnson # Date 1413295620 14400 # Node ID c392c4007d5ec0219ed968fa3af45b7765a96715 Imported from capsule None diff -r 000000000000 -r c392c4007d5e screen.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/screen.xml Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,159 @@ + + Given a motif, find all regions that match the motif + + tool_macros.xml + + + + +MotifScan.py +#include source=$ref_genome_seq_opts# +-b $output_bed -f $output_fa +#if $motifs.motifSrc == 'raw': +-p $user_db_file $bfile $bfile.metadata.dbkey ignore_this +#elif $motifs.motifSrc == 'pssm': +-p $motifs.pssm_file $bfile $bfile.metadata.dbkey ignore_this +#elif $motifs.motifSrc == 'db': +-m $motifs.db $bfile $bfile.metadata.dbkey $motifs.motif_id +#elif $motifs.motifSrc == 'userdb': +-n $motifs.usr_db $bfile $bfile.metadata.dbkey $motifs.motif_id +#end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +#if $motifs.motifSrc == 'raw': +echo $motifs.file_data +#end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Given a motif, this tool will find all regions that match the +motif. This tool is made by Cliff Meyer and Len Taing. + +.. class:: infomark + +**TIP:** Please check the result from Seqpos tool to understand the +parameters of this tool, such as Motif id, xml file, PSSM + +.. class:: warningmark + +**NEED IMPROVEMENT** + +----- + +**4 ways to specify the input** + +- **Method 1:** You can specify a motif in our motif database by the + motif id such as MM00481 for AR motif in TRANSFAC. This way, you + need to provide id in **Motif id**, and choose a "motif database" + from PBM, TRANSFAC or Y1H. +- **Method 2:** You can specify a motif with Seqpos result by the + motif id such as MM00481_observed for observed AR motif. This way, + you need to provide id in **Motif id**, and choose the Seqpos output + xml file in the drop-down menu of **Motif XML file**. +- **Method 3:** You can upload a PSSM file containing a motif matrix + to the history, and choose it from drag-down menu of **PSSM file**. +- **Method 4:** You can paste a PSSM raw text string to **PSSM Raw + Text** to scan. An example for this string can be seen in Seqpos + HTML result by selecting a motif and click the *show pssm in a new + window* button + +**Other parameters** + +- **BED file** defines the regions you want to scan the motif on. +- **Genome Asssembly version** is the UCSC database version. The tool + use this information to extract the DNA sequences in the regions of + **BED file**. + +.. class:: infomark + +**TIP:** To browse the known motif databases, click here_ +link to: http://cistrome.org/~jian/motif_collection/databases/Cistrome/Cistrome.xml + +.. _here: http://cistrome.org/~jian/motif_collection/databases/Cistrome/Cistrome.xml + +----- + +**Output** + +- **BED output** contains the regions with the motif. +- **Fasta output** contains the DNA sequences of motif. + + + + + diff -r 000000000000 -r c392c4007d5e seqpos.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/seqpos.xml Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,146 @@ + + Find motifs from given regions enriched near the centers + + tool_macros.xml + + + + +MDSeqPos.py +#include source=$ref_genome_seq_opts# +#if $search_type != None and len(str($search_type)) > 0: + -m $search_type +#end if +$denovo +-v -c --hcluster="$hcluster" -w "$width" +#if $maxmotif != None: + --maxmotif=$maxmotif +#end if +-p "$pval" +#if $species_list != None and len(str($search_type)) > 0: + -s $species_list +#end if +$bfile $bfile.metadata.dbkey &> $log && +cp results/table.html $output_html && +mkdir $output_html.extra_files_path && +cp -R results/* $output_html.extra_files_path + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + denovo == True + + + + + +The **SeqPos** tool will find motifs enriched in a set of +regions. **SeqPos** use the distances from motif positions to the peak +summits ( center of the regions) to find the most enriched motifs near +peak summits. **SeqPos** can scan all the motifs in TRANSFAC, Matha's +Protein Binding Microarray ( a.k.a PBM) and Scot Wolfe's protein DNA +binding database ( y1h). Also **SeqPos** can try to find *de novo* +motifs using MDscan algorithm. At last, **SeqPos** can cluster the +similar motifs in a cluster tree to help user filter out the redundant +motifs. This tool is made by Cliff Meyer and Len Taing. A detail +explanation of the algorithm can be found in the supplementary +material of the paper "Nucleosome dynamics define transcriptional +enhancers." (Nat Genet, 42(4):343-347) The tool was modified then by +Jian Ma and Tao Liu. Version: 0.590. + +About our curated cistrome motif database: This database only +includes human and mouse data. It puts data from Transfac, +JASPAR, UniPROBE (pbm), hPDI together, also it includes the motifs derived +from ChIP-seq data. After that we delete the motifs look similar from +each other to keep a clean and smaller database. This database is a +recommended one and always in updating. + +.. class:: infomark + +**TIP:** Please make sure the regions in your BED file is valid! If +the region is out of boundary of chromosome, it will cause error. Also +please avoid abnormal chromosome names. + +.. class:: infomark + +**TIP:** The running time is increasing with the number of +regions. Please avoid using more than 10 thousand regions for input. + +.. class:: warningmark + +**NEED IMPROVEMENT** + +----- + +**Parameters** + +- **BED file** is the input file. It can be the output from peak + calling softwares. Please pay attention that the regions in the BED + file should not be out of boundary of chromosome. + *This file can only contain at most 5000 lines. If not, please + filter it using Galaxy:Filter and Sort tool*. + +- **Genome Assembly version** is the UCSC database version. +- **Motif databases** is the known motif collections in Cistrome, + including TRANSFAC, PBM and Scot wolfe's database. You can select + *de novo motif search* to enable *de novo* motif scan. +- **Species list** are the species that you want to filter the results + with. Select none of the species to see all of the results. +- **Width of regions** is the region to scan for motifs around peak + summits ( centers of input regions). +- **P-value cutoff** can be used to filter the results. + +.. class:: infomark + +**TIP:** To browse the known motif databases, click here_ + +.. _here: http://cistrome.org/~jian/motif_collection/databases/Cistrome/Cistrome.xml + +----- + +**Output** + +- **HTML output** can be open in web browser. Users can browse the + result in either the middle list view of the page or the bottom + cluster tree view, and the detail of motif can be seen in the top + detail view. The list view is sortable at every field. The detail + view provides two buttons to open the detail information in a + separate webpage, or to show the PSSM of the motif. +- **XML output** is the XML formated output. +- **LOG file** is for job log. If you see errors, please attach this + in the bug report. + + + + diff -r 000000000000 -r c392c4007d5e tool-data/cistrome_assembly.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/cistrome_assembly.loc.sample Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,18 @@ +#This file lists the locations and dbkeys of all the fasta files +#under the "genome" directory (a directory that contains a directory +#for each build). The script extract_fasta.py will generate the file +#all_fasta.loc. This file has the format (white space characters are +#TAB characters): +# +# +# +#So, all_fasta.loc could look something like this: +# +#apiMel3 apiMel3 Honeybee (Apis mellifera): apiMel3 /path/to/genome/apiMel3/apiMel3.fa +#hg19canon hg19 Human (Homo sapiens): hg19 Canonical /path/to/genome/hg19/hg19canon.fa +#hg19full hg19 Human (Homo sapiens): hg19 Full /path/to/genome/hg19/hg19full.fa +# +#Your all_fasta.loc file should contain an entry for each individual +#fasta file. So there will be multiple fasta files for each build, +#such as with hg19 above. +# diff -r 000000000000 -r c392c4007d5e tool_data_table_conf.xml.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.sample Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,7 @@ + + + + value, dbkey, name, path + +

+ diff -r 000000000000 -r c392c4007d5e tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff -r 000000000000 -r c392c4007d5e tool_macros.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_macros.xml Tue Oct 14 10:07:00 2014 -0400 @@ -0,0 +1,89 @@ + + + + + numpy + cistrome + + + + + numpy + cistrome + jinja2 + R + bioc_seqlogo + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +For details about this application, please go to: + https://bitbucket.org/cistrome/cistrome-harvard/wiki/Home + + + ------ + +**Citation** + +For the underlying tool, please cite the following publication: +"Cistrome: an integrative platform for transcriptional regulation studies," Liu T, Ortiz JA, Taing L, Meyer CA, Lee B, Zhang Y, Shin H, Wong SS, Ma J, Lei Y, Pape UJ, Poidinger M, Chen Y, Yeung K, Brown M, Turpaz Y, Liu XS. Genome Biol. 2011 Aug 22;12(8):R83. + + +