# HG changeset patch # User bgruening # Date 1391522541 18000 # Node ID c5787c91cab8d246b7411eaea2259927b64e3cfb # Parent ef1232cedacb9cb93764d5c533c3b0712a4da9df Uploaded diff -r ef1232cedacb -r c5787c91cab8 Archive.zip Binary file Archive.zip has changed diff -r ef1232cedacb -r c5787c91cab8 bamCompare.xml --- a/bamCompare.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/bamCompare.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + normalizes and compares two BAM files to obtain the ratio, log2ratio or difference. (bam2bigwig) @@ -47,8 +47,8 @@ #end if #end if - #if str(region).strip() != '': - --region 'region' + #if str($region).strip() != '': + --region '$region' #end if #if $advancedOpt.showAdvancedOpt == "yes": @@ -74,10 +74,10 @@ - - - + @@ -205,9 +205,7 @@ .. image:: $PATH_TO_IMAGES/norm_IGVsnapshot_indFiles.png -You can find more details in the `bamCompare wiki`_. - -.. _bamCompare wiki: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCompare +You can find more details on the bamCompare wiki page: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCompare **Output files**: diff -r ef1232cedacb -r c5787c91cab8 bamCorrelate.xml --- a/bamCorrelate.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/bamCorrelate.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + correlates pairs of BAM files @@ -142,23 +142,21 @@ is to check the correlation between replicates or published data sets. The tool splits the genomes into bins of given length. For each bin, the number of reads -found in each BAM file is counted and a correlation is computed for all -pairs of BAM files. +found in each BAM file is counted and a correlation (either Pearson or Spearman) is computed for all +pairs of BAM files. Finally, a heatmap is drawn based on the similarity of the samples. .. image:: $PATH_TO_IMAGES/QC_bamCorrelate_humanSamples.png :alt: Heatmap of RNA Polymerase II ChIP-seq -You can find more details in the `bamCorrelate wiki`_. - -.. _bamCorrelate wiki: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCompare +You can find more details on the bamCorrelate wiki page: https://github.com/fidelram/deepTools/wiki/QC#wiki-bamCorrelate **Output files**: -- diagnostic plot produced by bamCorrelate is a clustered heatmap displaying the values for each pair-wise correlation, see below for an example -- data matrix (optional) in case you want to plot the correlation values using a different program, e.g. R, this matrix can be used +- **diagnostic plot**: clustered heatmap displaying the values for each pair-wise correlation, see below for an example +- data matrix (optional): if you want to plot the correlation values using a different program, e.g. R, this matrix can be used ----- diff -r ef1232cedacb -r c5787c91cab8 bamCoverage.xml --- a/bamCoverage.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/bamCoverage.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + generates a coverage bigWig file from a given BAM file. Multiple options are available to count reads and normalize coverage. (bam2bigwig) @@ -31,8 +31,8 @@ --scaleFactor $scaling.scaleFactor #end if - #if str(region).strip() != '': - --region 'region' + #if str($region).strip() != '': + --region '$region' #end if #if $advancedOpt.showAdvancedOpt == "yes": @@ -133,18 +133,16 @@ **What it does** -Given a BAM file, this tool generates a bigWig or bedGraph file of fragment or read coverages. +Given a BAM file, this tool generates a bigWig or bedGraph file with genome-wide coverage of fragment or read coverages. The way the method works is by first calculating all the number of reads (either extended to match the fragment length or not) -that overlap each bin in the genome. Bins with zero counts are skipped, i.e. not added to the output file. +that overlap each bin (a region of fixed length, i.e. 25 bp) in the genome. Bins with zero counts are skipped, i.e. not added to the output file. The resulting read counts can be normalized using either a given scaling factor, the RPKM formula or to get a 1x depth of coverage (RPGC). .. image:: $PATH_TO_IMAGES/norm_IGVsnapshot_indFiles.png -You can find more details in the `bamCoverage wiki`_. - -.. _bamCoverage wiki: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCoverage +You can find more details on the bamCoverage wiki page: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCoverage **Output files**: diff -r ef1232cedacb -r c5787c91cab8 bamFingerprint.xml --- a/bamFingerprint.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/bamFingerprint.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + plots profiles of BAM files; useful for assesing ChIP signal strength @@ -30,8 +30,8 @@ --plotFileFormat 'png' #end if - #if str(region).strip() != '': - --region 'region' + #if str($region).strip() != '': + --region '$region' #end if #if $advancedOpt.showAdvancedOpt == "yes": @@ -120,11 +120,13 @@ **What it does** -This tool is based on a method developed by Diaz et al. (2012). Stat Appl Genet Mol Biol 11(3). -The resulting plot can be used to assess the strength of a ChIP (for factors that bind to narrow regions). +This tool is useful to assess the strength of a ChIP (i.e. how clearly the enrichment signal can be separated from the background signal) +and it is based on a method developed by Diaz et al. (2012) Stat Appl Genet Mol Biol 11(3). + The tool first samples indexed BAM files and counts all reads overlapping a window (bin) of specified length. -These counts are then sorted according to their rank and the cumulative sum of read counts are plotted. An ideal input -with perfect uniform distribution of reads along the genome (i.e. without enrichments in open chromatin etc.) should +These counts are then sorted according to their rank (the bin with the highest number of reads has the highest rank) +and the cumulative sum of read counts are plotted. An ideal input (control sample) with perfect uniform distribution of reads +along the genome (i.e. without enrichments in open chromatin etc.) should generate a straight diagonal line. A very specific and strong ChIP enrichment will be indicated by a prominent and steep rise of the cumulative sum towards the highest rank. This means that a big chunk of reads from the ChIP sample is located in few bins which corresponds to high, narrow enrichments seen for transcription factors. @@ -133,9 +135,7 @@ .. image:: $PATH_TO_IMAGES/QC_fingerprint.png -You can find more details in the `bamFingerprint wiki`_. - -.. _bamFingerprint wiki: https://github.com/fidelram/deepTools/wiki/QC#wiki-bamFingerprint +You can find more details on the bamFingerprint wiki page: https://github.com/fidelram/deepTools/wiki/QC#wiki-bamFingerprint **Output files**: diff -r ef1232cedacb -r c5787c91cab8 bigwigCompare.xml --- a/bigwigCompare.xml Tue Feb 04 08:56:19 2014 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,102 +0,0 @@ - - normalizes and compares two bigWig files to obtain the ratio, log2ratio or difference - - - - bigwigCompare - deepTools_macros.xml - - - bigwigCompare - - @THREADS@ - - --bigwig1 '$bigwigFile1' - --bigwig2 '$bigwigFile2' - - --outFileName '$outFileName' - --outFileFormat '$outFileFormat' - - --ratio $comparison_type - - #if str(region).strip() != '': - --region 'region' - #end if - - #if $advancedOpt.showAdvancedOpt == "yes": - - --missingDataAsZero $advancedOpt.missingDataAsZero - --scaleFactors '$advancedOpt.scaleFactor1:$advancedOpt.scaleFactor2' - --pseudocount '$advancedOpt.pseudocount' - --binSize $advancedOpt.binSize - - #end if - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -**What it does** - -This tool compares two bigwig files based on the number of mapped reads. To -compare the bigwig files the genome is partitioned into bins of equal size, -then the number of reads found in each BAM file are counted for such bins and -finally a summarizing value is reported. This value can be the ratio of the -number of reads per bin, the log2 of the ratio, the sum or the difference. - - ------ - -@REFERENCES@ - - - diff -r ef1232cedacb -r c5787c91cab8 computeGCBias.xml --- a/computeGCBias.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/computeGCBias.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + to see whether your samples should be normalized for GC bias @@ -25,8 +25,8 @@ --effectiveGenomeSize $effectiveGenomeSize.effectiveGenomeSize_opt #end if - #if str(region).strip() != '': - --region 'region' + #if str($region).strip() != '': + --region '$region' #end if #if $advancedOpt.showAdvancedOpt == "yes": @@ -112,7 +112,7 @@ **What it does** -This tool computes the GC bias using the method proposed by Benjamini and Speed (2012). Nucleic Acids Res. (see below for more explanations) +This tool computes the GC bias using the method proposed by Benjamini and Speed (2012) Nucleic Acids Res. (see below for more explanations) The output is used to plot the bias and can also be used later on to correct the bias with the tool correctGCbias. There are two plots produced by the tool: a boxplot showing the absolute read numbers per genomic-GC bin and an x-y plot depicting the ratio of observed/expected reads per genomic GC content bin. @@ -132,9 +132,7 @@ .. image:: $PATH_TO_IMAGES/QC_GCplots_input.png -You can find more details in the `computeGCBias wiki`_. - -.. _computeGCBias wiki: https://github.com/fidelram/deepTools/wiki/QC#wiki-computeGCbias +You can find more details on the computeGCBias wiki page: computeGCBias wiki: https://github.com/fidelram/deepTools/wiki/QC#wiki-computeGCbias **Output files**: diff -r ef1232cedacb -r c5787c91cab8 computeMatrix.xml --- a/computeMatrix.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/computeMatrix.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + summarizes and prepares an intermediary file containing scores associated with genomic regions that can be used afterwards to plot a heatmap or a profile @@ -194,23 +194,24 @@ **What it does** -This tool summarizes and prepares an intermediary file -containing scores associated with genomic regions that can be used +This tool prepares an intermediary file (a gzipped table of values) +that contains scores associated with genomic regions that can be used afterwards to plot a heatmap or a profile. Genomic regions can really be anything - genes, parts of genes, ChIP-seq peaks, favorite genome regions... as long as you provide a proper file -in BED or INTERVAL format. This tool can also be used to filter and sort -regions according to their score. +in BED or INTERVAL format. If you would like to compare different groups of regions +(i.e. genes from chromosome 2 and 3), you can supply more than 1 BED file, one for each group. + +computeMatrix can also be used to filter and sort +regions according to their score by making use of its advanced output options. .. image:: $PATH_TO_IMAGES/flowChart_computeMatrixetc.png :alt: Relationship between computeMatrix, heatmapper and profiler -You can find more details in the `computeMatrix wiki`_. - -.. _computeMatrix wiki: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-computeMatrix +You can find more details on the computeMatrix wiki page: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-computeMatrix ----- diff -r ef1232cedacb -r c5787c91cab8 correctGCBias.xml --- a/correctGCBias.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/correctGCBias.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + uses the output from computeGCBias to generate corrected BAM files @@ -33,8 +33,8 @@ --effectiveGenomeSize $effectiveGenomeSize.effectiveGenomeSize_opt #end if - #if str(region).strip() != '': - --region 'region' + #if str($region).strip() != '': + --region '$region' #end if #if $advancedOpt.showAdvancedOpt == "yes": @@ -88,13 +88,11 @@ **What it does** -This tool requires the output from computeGCBias to correct the given BAM files according to the method proposed by Benjamini and Speed (2012). Nucleic Acids Res. -The resulting BAM files can be used in any downstream analyses, but be aware that you should not filter out duplicates from here on. - +This tool requires the output from computeGCBias to correct a given BAM file according to the method proposed by +Benjamini and Speed (2012) Nucleic Acids Res. +The resulting BAM file can be used in any downstream analyses, but be aware that you should not filter out duplicates from here on. -You can find more details in the `correctGCBias wiki`_. - -.. _correctGCBias wiki: https://github.com/fidelram/deepTools/wiki/QC#wiki-correctGCbias +You can find more details on the correctGCBias wiki page: https://github.com/fidelram/deepTools/wiki/QC#wiki-correctGCbias **Output files**: diff -r ef1232cedacb -r c5787c91cab8 deepTools_macros.xml --- a/deepTools_macros.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/deepTools_macros.xml Tue Feb 04 09:02:21 2014 -0500 @@ -50,7 +50,7 @@ samtools deepTools ucsc_tools - deepTools + deepTools ucsc_tools numpy pysam @@ -67,7 +67,7 @@ @@ -200,7 +200,7 @@ - + diff -r ef1232cedacb -r c5787c91cab8 heatmapper.xml --- a/heatmapper.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/heatmapper.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + creates a heatmap for a score associated to genomic regions @@ -185,22 +185,21 @@ **What it does** -The heatmapper visualizes scores associated with genomic regions, for example ChIP enrichment values around the TSS of genes. -Those values can be visualized individually along each of the regions provided by the user in INTERVAL or BED format. -In addition to the heatmap, an average profile plot is plotted on top of the heatmap (can be turned off by the user; -it can also be generated separately by the tool profiler). -We implemented vast optional parameters and we encourage you to play around with the min/max values displayed in the heatmap as well as -with the different coloring options. If you would like to plot heatmaps for different groups of genomic regions individually, -e.g. one plot per chromosome, simply supply each group as an individual BED file. +The heatmapper visualizes scores associated with genomic regions, for example ChIP enrichment values around the TSS of genes. +Like profiler, it requires that computeMatrix was run first to calculate the values. + +We implemented vast optional parameters to optimize the visual output and we encourage you to play around with the min/max values displayed in the heatmap as well as +with the different coloring options. The most powerful option is the k-means clustering where you simply need to indicate the number of +groups with similar read distributions that you expect and the algorithm will do the sorting for you. + +Do check the examples on our help page with step-by-step protocols: https://github.com/fidelram/deepTools/wiki/Example-workflows .. image:: $PATH_TO_IMAGES/visual_hm_DmelPolII.png :alt: Heatmap of RNA Polymerase II ChIP-seq -You can find more details in the `heatmapper wiki`_. - -.. _heatmapper wiki: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-heatmapper +You can find more details on the tool itself on the heatmapper wiki page: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-heatmapper ----- diff -r ef1232cedacb -r c5787c91cab8 profiler.xml --- a/profiler.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/profiler.xml Tue Feb 04 09:02:21 2014 -0500 @@ -1,4 +1,4 @@ - + creates a profile plot for a score associated to genomic regions @@ -141,9 +141,9 @@ **What it does** This tool plots the average enrichments over all genomic -regions supplied to computeMarix. It is a very useful complement to the -heatmapper, especially in cases when you want to compare the scores for -many different groups. Like heatmapper, profiler does not change the +regions supplied to computeMarix. It requires that computeMatrix was successfully run. +It is a very useful complement to the heatmapper, especially in cases when you want to +compare the scores for many different groups. Like heatmapper, profiler does not change the values that were compute by computeMatrix, but you can choose between many different ways to color and display the plots. @@ -152,9 +152,7 @@ :alt: Meta-gene profile of Rna Polymerase II -You can find more details in the `profiler wiki`_. - -.. _profiler wiki: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-profiler +You can find more details on the profiler wiki page: https://github.com/fidelram/deepTools/wiki/Visualizations#wiki-profiler ----- diff -r ef1232cedacb -r c5787c91cab8 tool_dependencies.xml --- a/tool_dependencies.xml Tue Feb 04 08:56:19 2014 -0500 +++ b/tool_dependencies.xml Tue Feb 04 09:02:21 2014 -0500 @@ -57,7 +57,7 @@ The tools downloaded by this dependency definition are free for academic use. TODO: UCSC tools are only available with their latest version. That is not good for reproducibility. - + git clone --recursive https://github.com/fidelram/deepTools.git @@ -79,7 +79,7 @@ - git reset --hard 1093b2d281576f23ee04740bd5eae3f7b8422f7e + git reset --hard 3268f7e1458f3a520ab6fea3039971ee9d7a6d5b $INSTALL_DIR/lib/python export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python &&