# HG changeset patch # User jbrayet # Date 1455120851 18000 # Node ID 0293edf403087c264f29733bb54db82c2cd75484 # Parent 299bac53243b04c16e5804e93091481646a0bb95 Uploaded diff -r 299bac53243b -r 0293edf40308 chipmunkv6_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/chipmunkv6_wrapper.xml Wed Feb 10 11:14:11 2016 -0500 @@ -0,0 +1,111 @@ + + De novo motif finding + + institutcuriengsintegration/chipmunk:6.0 + + chipmunkv6_wrapper.sh -f ${input_file} -n ${motif_number_selector} -s $chipmunk_version['version'] -m $minw -v $maxw -z $mode -o ${log_outfile} -i ${image_output} -x $name -r ${summary_file} -t ${seq_type} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +(di)ChIPmunk detects over-represented non-overlapping motifs in fasta sequences. + +**Which ChipMunk should you choose** : Mononucleotides Vs Dinucleotides + +- Mononucleotide version is to be used when: + (a) you do not know anything about motifs in your data (“draft” run) + (b) you plan to use other tools for downstream analysis (most of the existing tools will be able to utilize only mononucleotide matrices). + +- Dinucleotide version is better suited to produce a more precise representation of the optimal TFBS binding model. This would allow to properly estimate the number of sequences containing motif hits. e.g. to measure the percentage of “the most reliable” ChIP-Seq peaks in a given dataset. + +In terms of the consensus sequence, in general you should get very similar results from the mono- and dinucleotide versions. + + +**Type of the sequence set** + +**Simple** : for simple mutil-fasta to be searched in a double-strand DNA mode (the most common choice):: + + > header1 + ACTGTGTGAAA + > header2 + AGTGTGTGTGTG + +You can omit fasta headers since ChIPMunk would simply skip them. + + +**Peak** : for peak data with the positional prefences profile (often provided in wiggle-files, .wig). The profile of each sequence should be places in the fasta-header like:: + + > 1.0 2.0 3.0 2.0 1.5 2.0 + AGTAAC + > 1.0 2.0 3.0 2.0 1.5 + CAGTA + + +See **"Peak multi-fasta generator"** in the tool pannel, if you wish to generate peak data. + +NOTE that When base coverage information is available, it is highly recommaned to use peak data. This is extremely important for ChIPMunk performance. + +**Cite ChIPMunk** + +If you want to cite ChIPMunk in your research please refer to [1] for the basic mononucleotide version and to [2] for the dinucleotide version : + +[1] Deep and wide digging for binding motifs in ChIP-Seq data. Kulakovskiy IV, Boeva VA, Favorov AV,Makeev VJ. Bioinformatics. 2010 Oct 15;26(20):2622-3. doi: 10.1093/bioinformatics/btq488. Epub 2010 Aug24. + + +[2] From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites.Kulakovskiy I, Levitsky V, Oshchepkov D, Bryzgalov L, Vorontsov I, Makeev V. J Bioinform Comput Biol.2013 Feb;11(1):1340004. doi: 10.1142/S0219720013400040. Epub 2013 Jan 16. + + + + + + + + +