# HG changeset patch # User bernhardlutz # Date 1401996318 14400 # Node ID b78d20957e7fa352b5d632ed8e6508e9fb2ac83f # Parent 8f7e5aaf16a40aff067b6640413e2b477ebfd8fa Uploaded diff -r 8f7e5aaf16a4 -r b78d20957e7f BedToBam.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/BedToBam.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,34 @@ + + + + macros.xml + + + + + bedToBam + $ubam + $bed12 + -mapq $mapq + -i '$input' + > '$output' + + + + + + + + + + + + + + +**What it does** + +bedToBam converts features in a feature file to BAM format. This is useful as an efficient means of storing large genome annotations in a compact, indexed format for visualization purposes. + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f bamToBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/bamToBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,54 @@ + + + + macros.xml + + + + + bamToBed + $option + $ed_score + -i '$input' + > '$output' + #if str($tag): + -tag $tag + #end if + + + + + + + + + + + + + + + + + +**What it does** + +bedtools bamtobed is a conversion utility that converts sequence alignments in BAM format into BED, BED12, and/or BEDPE records. + +.. class:: infomark + +The "Report spliced BAM alignment..." option breaks BAM alignments with the "N" (splice) operator into distinct BED entries. For example, using this option on a CIGAR such as 50M1000N50M would, by default, produce a single BED record that spans 1100bp. However, using this option, it would create two separate BED records that are each 50bp in size and are separated by 1000bp (the size of the N operation). This is important for RNA-seq and structural variation experiments. + + +.. class:: warningmark + +If using a custom BAM alignment TAG as the BED score, note that this must be a numeric tag (e.g., type "i" as in NM:i:0). + +.. class:: warningmark + +If creating a BEDPE output (see output formatting options), the BAM file should be sorted by query name. + +@REFERENCES@ + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f closestBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/closestBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,41 @@ + + + + macros.xml + + + + + closestBed + $split + $strand + $addition + -t $ties + -a $inputA + -b $inputB + > $output + + + + + + + + + + + + + + + + + + + +**What it does** + +Similar to intersectBed, closestBed searches for overlapping features in A and B. In the event that no feature in B overlaps the current feature in A, closestBed will report the closest (that is, least genomic distance from the start or end of A) feature in B. For example, one might want to find which is the closest gene to a significant GWAS polymorphism. Note that closestBed will report an overlapping feature as the closest—that is, it does not restrict to closest non-overlapping feature. + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f clusterBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/clusterBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,37 @@ + + + + macros.xml + + + + + closestBed + $strand + -d $distance + -i $inputA + > $output + + + + + + + + + + + + +**What it does** + +Similar to merge, cluster report each set of overlapping or “book-ended” features in an interval file. In contrast to merge, cluster does not flatten the cluster of intervals into a new meta-interval; instead, it assigns an unique cluster ID to each record in each cluster. This is useful for having fine control over how sets of overlapping intervals in a single interval file are combined. + +.. image:: $PATH_TO_IMAGES/cluster-glyph.png + +.. class:: warningmark + +bedtools cluster requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files). + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f complementBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/complementBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,30 @@ + + + + macros.xml + + + + + complementBed + -d $distance + -g genome + > $output + + + + + + + + + + +**What it does** + +bedtools complement returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file. + +.. image:: $PATH_TO_IMAGES/complement-glyph.png + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f coverageBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/coverageBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,53 @@ + + of features in file A across the features in file B (coverageBed) + + macros.xml + + + + + coverageBed + #if $inputA.ext == "bam" + -abam '$inputA' + #else + -a '$inputA' + #end if + -b '$inputB' + $split + $strand + | sort -k1,1 -k2,2n + > '$output' + + + + + + + + + + + + + + + + + +**What it does** + +coverageBed_ computes both the depth and breadth of coverage of features in file A across the features in file B. For example, coverageBed can compute the coverage of sequence alignments (file A) across 1 kilobase (arbitrary) windows (file B) tiling a genome of interest. One advantage that coverageBed offers is that it not only counts the number of features that overlap an interval in file B, it also computes the fraction of bases in B interval that were overlapped by one or more features. +Thus, coverageBed also computes the breadth of coverage for each interval in B. + +.. coverageBed: http://bedtools.readthedocs.org/en/latest/content/tools/coverage.html + +.. class:: infomark + +The output file will be comprised of each interval from your original target BED file, plus an additional column indicating the number of intervals in your source file that overlapped that target interval. + +@REFERENCES@ + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f flankbed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/flankbed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,59 @@ + + + + macros.xml + + + + + flankBed + $pct + $strand + -g $genome + -i $inputA + > $output + #if addition.addition_select = 'b': + -b $b + #else: + -l $l + -r $r + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools flank will optionally create flanking intervals whose size is user-specified fraction of the original interval. + +.. image:: $PATH_TO_IMAGES/flank-glyph.png + +.. class:: warningmark + +In order to prevent creating intervals that violate chromosome boundaries, bedtools flank requires a genome file defining the length of each chromosome or contig. + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f genomeCoverageBed_bedgraph.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/genomeCoverageBed_bedgraph.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,122 @@ + + + + + macros.xml + + + + + genomeCoverageBed + #if $input.ext == "bam" + -ibam '$input' + #else + -i '$input' + -g ${chromInfo} + #end if + + #if str($scale): + -scale $scale + #end if + + -bg + $zero_regions + $split + $strand + > '$output' + + + + + + + + + + + + + + + + + + + + + + +$ cat A.bed +chr1 10 20 +chr1 20 30 +chr2 0 500 + +$ cat my.genome +chr1 1000 +chr2 500 + +$ bedtools genomecov -i A.bed -g my.genome +chr1 0 980 1000 0.98 +chr1 1 20 1000 0.02 +chr2 1 500 500 1 +genome 0 980 1500 0.653333 +genome 1 520 1500 0.346667 + + + + +**What it does** + +This tool calculates the genome-wide coverage of intervals defined in a BAM or BED file and reports them in BedGraph format. + +.. class:: warningmark + +The input BED or BAM file must be sorted by chromosome name (but doesn't necessarily have to be sorted by start position). + +----- + +**Example 1** + +Input (BED format)- +Overlapping, un-sorted intervals:: + + chr1 140 176 + chr1 100 130 + chr1 120 147 + + +Output (BedGraph format)- +Sorted, non-overlapping intervals, with coverage value on the 4th column:: + + chr1 100 120 1 + chr1 120 130 2 + chr1 130 140 1 + chr1 140 147 2 + chr1 147 176 1 + +----- + +**Example 2 - with ZERO-Regions selected (assuming hg19)** + +Input (BED format)- +Overlapping, un-sorted intervals:: + + chr1 140 176 + chr1 100 130 + chr1 120 147 + + +Output (BedGraph format)- +Sorted, non-overlapping intervals, with coverage value on the 4th column:: + + chr1 0 100 0 + chr1 100 120 1 + chr1 120 130 2 + chr1 130 140 1 + chr1 140 147 2 + chr1 147 176 1 + chr1 176 249250621 0 + +@REFERENCES@ + + diff -r 8f7e5aaf16a4 -r b78d20957e7f genomeCoverageBed_histogram.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/genomeCoverageBed_histogram.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,67 @@ + + + + + macros.xml + + + + + genomeCoverageBed + #if $input.ext == "bam" + -ibam '$input' + #else + -i '$input' + -g ${chromInfo} + #end if + #if str($max): + -max $max + #end if + > '$output' + + + + + + + + + + + + +**What it does** + +This tool calculates a histogram of genome coverage depth based on mapped reads in BAM format or intervals in BED format. + + +------ + + +.. class:: infomark + +The output file will contain five columns: + + * 1. Chromosome name (or 'genome' for whole-genome coverage) + * 2. Coverage depth + * 3. The number of bases on chromosome (or genome) with depth equal to column 2. + * 4. The size of chromosome (or entire genome) in base pairs + * 5. The fraction of bases on chromosome (or entire genome) with depth equal to column 2. + +**Example Output**:: + + chr2L 0 1379895 23011544 0.0599653 + chr2L 1 837250 23011544 0.0363839 + chr2L 2 904442 23011544 0.0393038 + chr2L 3 913723 23011544 0.0397072 + chr2L 4 952166 23011544 0.0413778 + chr2L 5 967763 23011544 0.0420555 + chr2L 6 986331 23011544 0.0428624 + chr2L 7 998244 23011544 0.0433801 + chr2L 8 995791 23011544 0.0432735 + chr2L 9 996398 23011544 0.0432999 + + +@REFERENCES@ + + diff -r 8f7e5aaf16a4 -r b78d20957e7f getfastaBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/getfastaBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,45 @@ + + + + macros.xml + + + + + bedtools getfasta + + $name + $tab + $strand + $split + -fi $fasta + -bed $inputA + -fo $output + + + + + + + + + + + + + + + +**What it does** + +bedtools getfasta will extract the sequence defined by the coordinates in a BED interval and create a new FASTA entry in the output file for each extracted sequence. By default, the FASTA header for each extracted sequence will be formatted as follows: “:-”. + +.. image:: $PATH_TO_IMAGES/getfasta-glyph.png + +.. class:: warningmark + +1. The headers in the input FASTA file must exactly match the chromosome column in the BED file. + +2. You can use the UNIX fold command to set the line width of the FASTA output. For example, fold -w 60 will make each line of the FASTA file have at most 60 nucleotides for easy viewing. + + diff -r 8f7e5aaf16a4 -r b78d20957e7f intersectBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/intersectBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,85 @@ + + + + macros.xml + + + + + intersectBed + #if $inputA.ext == "bam": + -abam $inputA + #else: + -a $inputA + #end if + + -b $inputB + $split + $strand + #if str($fraction): + -f $fraction + #end if + $reciprocal + $invert + $once + $header + $overlap_mode + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +By far, the most common question asked of two sets of genomic features is whether or not any of the features in the two sets “overlap” with one another. This is known as feature intersection. bedtools intersect allows one to screen for overlaps between two sets of genomic features. Moreover, it allows one to have fine control as to how the intersections are reported. bedtools intersect works with both BED/GFF/VCF and BAM files as input. + +.. image:: $PATH_TO_IMAGES/intersect-glyph.png + +.. class:: infomark + +Note that each BAM alignment is treated individually. Therefore, if one end of a paired-end alignment overlaps an interval in the BED file, yet the other end does not, the output file will only include the overlapping end. + +.. class:: infomark + +Note that a BAM alignment will be sent to the output file **once** even if it overlaps more than one interval in the BED file. + +@REFERENCES@ + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f intersectBed_bam_obsolete.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/intersectBed_bam_obsolete.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,135 @@ + + reports overlaps between two feature files. + + bedtools + intersectBed + + intersectBed +#if $intype.inselect == "bam" +-abam $intype.inputBam -b $input $intype.bed +#else +-a $intype.inputBed -b $input +#end if +#if $outputopt.showoutputopt == "yes" +$outputopt.wa $outputopt.wb $outputopt.wo $outputopt.wao $outputopt.u $outputopt.c $outputopt.v +#end if +#if $overlapopt.showoverlapopt == "yes" + #if str($overlapopt.f.value) != "None" + -f $overlapopt.f + #end if +$overlapopt.r $overlapopt.s +#end if +$split +> $output + + + + + + + + + + + + + + + + + + + + + + + + s + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +By far, the most common question asked of two sets of genomic features is whether or not any of the +features in the two sets "overlap" with one another. This is known as feature intersection. intersectBed +allows one to screen for overlaps between two sets of genomic features. Moreover, it allows one to have +fine control as to how the intersections are reported. intersectBed works with both BED/GFF +and BAM files as input. + +By default, if an overlap is found, intersectBed reports the shared interval between the two +overlapping features. + + +**Default behavior when using BAM input** + +When comparing alignments in BAM format to features in BED format, intersectBed +will, by default, write the output in BAM format. That is, each alignment in the BAM file that meets +the user's criteria will be written in BAM format. This serves as a mechanism to +create subsets of BAM alignments are of biological interest, etc. Note that only the mate in the BAM +alignment is compared to the BED file. Thus, if only one end of a paired-end sequence overlaps with a +feature in B, then that end will be written to the BAM output. By contrast, the other mate for the +pair will not be written. One should use pairToBed if one wants each BAM alignment +for a pair to be written to BAM output. + + +**Output BED format when using BAM input** + +When comparing alignments in BAM format to features in BED format, intersectBed +will optionally write the output in BED format. That is, each alignment in the BAM file is converted +to a 6 column BED feature and if overlaps are found (or not) based on the user's criteria, the BAM +alignment will be reported in BED format. The BED "name" field is comprised of the RNAME field in +the BAM alignment. If mate information is available, the mate (e.g., "/1" or "/2") field will be +appended to the name. The "score" field is the mapping quality score from the BAM alignment. + + + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f jaccardBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/jaccardBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,44 @@ + + + + macros.xml + + + + + bedtools jaccard + + $reciprocal + $strand + $split + -f $overlap + -a $inputA + -b $inputB + > $output + + + + + + + + + + + + + + + + +**What it does** + +By default, bedtools jaccard reports the length of the intersection, the length of the union (minus the intersection), the final Jaccard statistic reflecting the similarity of the two sets, as well as the number of intersections. +Whereas the bedtools intersect tool enumerates each an every intersection between two sets of genomic intervals, one often needs a single statistic reflecting the similarity of the two sets based on the intersections between them. The Jaccard statistic is used in set theory to represent the ratio of the intersection of two sets to the union of the two sets. Similarly, Favorov et al [1] reported the use of the Jaccard statistic for genome intervals: specifically, it measures the ratio of the number of intersecting base pairs between two sets to the number of base pairs in the union of the two sets. The bedtools jaccard tool implements this statistic, yet modifies the statistic such that the length of the intersection is subtracted from the length of the union. As a result, the final statistic ranges from 0.0 to 1.0, where 0.0 represents no overlap and 1.0 represent complete overlap. +.. image:: $PATH_TO_IMAGES/jaccard-glyph.png + +.. class:: warningmark + +The jaccard tool requires that your data is pre-sorted by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files). + + diff -r 8f7e5aaf16a4 -r b78d20957e7f linksBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/linksBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,31 @@ + + + + macros.xml + + + + + linksBed + -base $basename + -org $org + -db $db + -i $inputA + > $output + + + + + + + + + + + + +**What it does** + +Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a file. This is useful for cases when one wants to manually inspect through a large set of annotations or features. + + diff -r 8f7e5aaf16a4 -r b78d20957e7f macros.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/macros.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,42 @@ + + + + + + + + + + + + + + + + + + + + + + + + + bedtools + + bedtools --version + + + + + +------ + +This tool is part of the `bedtools package`__ from the `Quinlan laboratory`__. +If you use this tool, please cite `Quinlan AR, and Hall I.M. BEDTools: A flexible framework for comparing genomic features. Bioinformatics, 2010, 26, 6.`__ + +.. __: https://github.com/arq5x/bedtools2 +.. __: http://cphg.virginia.edu/quinlan/ +.. __: http://bioinformatics.oxfordjournals.org/content/26/6/841.short + + diff -r 8f7e5aaf16a4 -r b78d20957e7f mapBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mapBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,75 @@ + + + + macros.xml + + + + + bedtools map + -a $inputA + -b $inputB + $strand + -o $operation + -c $col + -f $overlap + $reciprocal + $split + $header + #if $genome.genome_choose == "-g" : + -g $genome.genome + #end if + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools map allows one to map overlapping features in a B file onto features in an A file and apply statistics and/or summary operations on those features. + +.. image:: $PATH_TO_IMAGES/map-glyph.png + +.. class:: infomark + +bedtools map requires each input file to be sorted by genome coordinate. For BED files, this can be done with sort -k1,1 -k2,2n. Other sorting criteria are allowed if a genome file (-g) is provides that specifies the expected chromosome order. + +.. class:: infomark + +The map tool is substantially faster in versions 2.19.0 and later. The plot below demonstrates the increased speed when, for example, counting the number of exome alignments that align to each exon. The bedtools times are compared to the bedops bedmap utility as a point of reference. + + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f maskFastaBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/maskFastaBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,35 @@ + + + + macros.xml + + + + + bedtools maskfasta + $soft + -mc $mc + -fi $fasta + -bed $inputA + -fo $output + + + + + + + + + + + + + +**What it does** + +bedtools maskfasta masks sequences in a FASTA file based on intervals defined in a feature file. The headers in the input FASTA file must exactly match the chromosome column in the feature file. This may be useful fro creating your own masked genome file based on custom annotations or for masking all but your target regions when aligning sequence data from a targeted capture experiment. + +.. image:: $PATH_TO_IMAGES/maskfasta-glyph.png + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f mergeBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mergeBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,209 @@ + + (mergeBed) + + macros.xml + + + + + mergeBed + -i $input + $strandedness + $report_number + -d $distance + $nms + #if str($scores) != 'none' + -scores $scores + #end if + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools merge combines overlapping or "book-ended" features in an interval file into a single feature which spans all of the combined features. + + +.. image:: $PATH_TO_IMAGES/merge-glyph.png + + +.. class:: warningmark + +bedtools merge requires that you presort your data by chromosome and then by start position. + + +========================================================================== +Default behavior +========================================================================== +By default, ``bedtools merge`` combines overlapping (by at least 1 bp) and/or +bookended intervals into a single, "flattened" or "merged" interval. + +:: + + $ cat A.bed + chr1 100 200 + chr1 180 250 + chr1 250 500 + chr1 501 1000 + + $ bedtools merge -i A.bed + chr1 100 500 + chr1 501 1000 + + +========================================================================== +*-s* Enforcing "strandedness" +========================================================================== +The ``-s`` option will only merge intervals that are overlapping/bookended +*and* are on the same strand. + +:: + + $ cat A.bed + chr1 100 200 a1 1 + + chr1 180 250 a2 2 + + chr1 250 500 a3 3 - + chr1 501 1000 a4 4 + + + $ bedtools merge -i A.bed -s + chr1 100 250 + + chr1 501 1000 + + chr1 250 500 - + + +========================================================================== +*-n* Reporting the number of features that were merged +========================================================================== +The -n option will report the number of features that were combined from the +original file in order to make the newly merged feature. If a feature in the +original file was not merged with any other features, a "1" is reported. + +:: + + $ cat A.bed + chr1 100 200 + chr1 180 250 + chr1 250 500 + chr1 501 1000 + + $ bedtools merge -i A.bed -n + chr1 100 500 3 + chr1 501 1000 1 + + +========================================================================== +*-d* Controlling how close two features must be in order to merge +========================================================================== +By default, only overlapping or book-ended features are combined into a new +feature. However, one can force ``merge`` to combine more distant features +with the ``-d`` option. For example, were one to set ``-d`` to 1000, any +features that overlap or are within 1000 base pairs of one another will be +combined. + +:: + + $ cat A.bed + chr1 100 200 + chr1 501 1000 + + $ bedtools merge -i A.bed + chr1 100 200 + chr1 501 1000 + + $ bedtools merge -i A.bed -d 1000 + chr1 100 200 1000 + + +============================================================= +*-nms* Reporting the names of the features that were merged +============================================================= +Occasionally, one might like to know that names of the features that were +merged into a new feature. The ``-nms`` option will add an extra column to the +``merge`` output which lists (separated by semicolons) the names of the +merged features. + +:: + + $ cat A.bed + chr1 100 200 A1 + chr1 150 300 A2 + chr1 250 500 A3 + + $ bedtools merge -i A.bed -nms + chr1 100 500 A1,A2,A3 + + +=============================================================== +*-scores* Reporting the scores of the features that were merged +=============================================================== +Similarly, we might like to know that scores of the features that were +merged into a new feature. Enter the ``-scores`` option. One can specify +how the scores from each overlapping interval should be reported. + +:: + + $ cat A.bed + chr1 100 200 A1 1 + chr1 150 300 A2 2 + chr1 250 500 A3 3 + + $ bedtools merge -i A.bed -scores mean + chr1 100 500 2 + + $ bedtools merge -i A.bed -scores max + chr1 100 500 3 + + $ bedtools merge -i A.bed -scores collapse + chr1 100 500 1,2,3 + + +@REFERENCES@ + + diff -r 8f7e5aaf16a4 -r b78d20957e7f multiIntersectBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/multiIntersectBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,195 @@ + + + + macros.xml + + + + + multiIntersectBed + $header + #if $zero.value == True: + -empty + -g ${chromInfo} + #end if + + -i '$input1' + '$input2' + #for $q in $beds + '${q.input}' + #end for + + -names + #if $name1.choice == "tag": + '${input1.name}' + #else + '${name1.custom_name}' + #end if + + #if $name2.choice == "tag": + '${input2.name}' + #else + '${name2.custom_name}' + #end if + + #for $q in $beds + #if $q.name.choice == "tag": + '${q.input.name}' + #else + '${q.input.custom_name}' + #end if + #end for + > '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool identifies common intervals among multiple, sorted BED files. Intervals can be common among 0 to N of the N input BED files. The pictorial and raw data examples below illustrate the behavior of this tool more clearly. + + +.. image:: http://people.virginia.edu/~arq5x/files/bedtools-galaxy/mbi.png + + +.. class:: warningmark + +This tool requires that each BED file is reference-sorted (chrom, then start). + + +.. class:: infomark + +The output file will contain five fixed columns, plus additional columns for each BED file: + + * 1. Chromosome name (or 'genome' for whole-genome coverage). + * 2. The zero-based start position of the interval. + * 3. The one-based end position of the interval. + * 4. The number of input files that had at least one feature overlapping this interval. + * 5. A list of input files or labels that had at least one feature overlapping this interval. + * 6. For each input file, an indication (1 = Yes, 0 = No) of whether or not the file had at least one feature overlapping this interval. + +------ + +**Example input**:: + + # a.bed + chr1 6 12 + chr1 10 20 + chr1 22 27 + chr1 24 30 + + # b.bed + chr1 12 32 + chr1 14 30 + + # c.bed + chr1 8 15 + chr1 10 14 + chr1 32 34 + + +------ + +**Example without a header and without reporting intervals with zero coverage**:: + + + chr1 6 8 1 1 1 0 0 + chr1 8 12 2 1,3 1 0 1 + chr1 12 15 3 1,2,3 1 1 1 + chr1 15 20 2 1,2 1 1 0 + chr1 20 22 1 2 0 1 0 + chr1 22 30 2 1,2 1 1 0 + chr1 30 32 1 2 0 1 0 + chr1 32 34 1 3 0 0 1 + + +**Example adding a header line**:: + + + chrom start end num list a.bed b.bed c.bed + chr1 6 8 1 1 1 0 0 + chr1 8 12 2 1,3 1 0 1 + chr1 12 15 3 1,2,3 1 1 1 + chr1 15 20 2 1,2 1 1 0 + chr1 20 22 1 2 0 1 0 + chr1 22 30 2 1,2 1 1 0 + chr1 30 32 1 2 0 1 0 + chr1 32 34 1 3 0 0 1 + + +**Example adding a header line and custom file labels**:: + + + chrom start end num list joe bob sue + chr1 6 8 1 joe 1 0 0 + chr1 8 12 2 joe,sue 1 0 1 + chr1 12 15 3 joe,bob,sue 1 1 1 + chr1 15 20 2 joe,bob 1 1 0 + chr1 20 22 1 bob 0 1 0 + chr1 22 30 2 joe,bob 1 1 0 + chr1 30 32 1 bob 0 1 0 + chr1 32 34 1 sue 0 0 1 + + +@REFERENCES@ + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f overlapBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/overlapBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,28 @@ + + + + macros.xml + + + + + overlap + -i $inputA + -cols $cols + > $output + + + + + + + + + + +**What it does** + +overlap computes the amount of overlap (in the case of positive values) or distance (in the case of negative values) between feature coordinates occurring on the same input line and reports the result at the end of the same line. In this way, it is a useful method for computing custom overlap scores from the output of other BEDTools. + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f sortBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/sortBed.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,45 @@ + + + + macros.xml + + + + + sortBed -i $input $option > $output + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Sorts a feature file by chromosome and other criteria. + + +.. class:: warningmark + +It should be noted that sortBed is merely a convenience utility, as the UNIX sort utility +will sort BED files more quickly while using less memory. For example, UNIX sort will sort a BED file +by chromosome then by start position in the following manner: sort -k 1,1 -k2,2 -n a.bed + +@REFERENCES@ + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/cluster-glyph.png Binary file static/images/cluster-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/complement-glyph.png Binary file static/images/complement-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/flank-glyph.png Binary file static/images/flank-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/genomecov-glyph.png Binary file static/images/genomecov-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/getfasta-glyph.png Binary file static/images/getfasta-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/intersect-glyph.png Binary file static/images/intersect-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/jaccard-glyph.png Binary file static/images/jaccard-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/map-glyph.png Binary file static/images/map-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/maskfasta-glyph.png Binary file static/images/maskfasta-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f static/images/merge-glyph.png Binary file static/images/merge-glyph.png has changed diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/0.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/0.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,4 @@ +chr1 100 200 +chr1 180 250 +chr1 250 500 +chr1 501 1000 diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/0_result.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/0_result.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,2 @@ +chr1 100 500 +chr1 501 1000 diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/1.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,4 @@ +chr1 100 200 a1 1 + +chr1 180 250 a2 2 + +chr1 250 500 a3 3 - +chr1 501 1000 a4 4 + diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/1_result.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1_result.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,3 @@ +chr1 100 250 + +chr1 501 1000 + +chr1 250 500 - diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/2.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/2.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,4 @@ +chr1 100 200 +chr1 180 250 +chr1 250 500 +chr1 501 1000 diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/2_result.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/2_result.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,2 @@ +chr1 100 500 3 +chr1 501 1000 1 diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/3.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/3.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,2 @@ +chr1 100 200 +chr1 501 1000 diff -r 8f7e5aaf16a4 -r b78d20957e7f test-data/3_result_1000.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/3_result_1000.bed Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,1 @@ +chr1 100 200 1000 diff -r 8f7e5aaf16a4 -r b78d20957e7f tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,6 @@ + + + + + + diff -r 8f7e5aaf16a4 -r b78d20957e7f unionBedGraphs.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/unionBedGraphs.xml Thu Jun 05 15:25:18 2014 -0400 @@ -0,0 +1,231 @@ + + + + macros.xml + + + + unionBedGraphs + $header + -filler '$filler' + #if $zero.value == True: + -empty + -g ${chromInfo} + #end if + + -i '$input1' + '$input2' + #for $q in $bedgraphs + '${q.input}' + #end for + + -names + #if $name1.choice == "tag": + '${input1.name}' + #else + '${name1.custom_name}' + #end if + + #if $name2.choice == "tag": + '${input2.name}' + #else + '${name2.custom_name}' + #end if + + #for $q in $bedgraphs + #if $q.name.choice == "tag": + '${q.input.name}' + #else + '${q.input.custom_name}' + #end if + #end for + > '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool merges multiple BedGraph files, allowing direct and fine-scale coverage comparisons among many samples/files. The BedGraph files need not represent the same intervals; the tool will identify both common and file-specific intervals. In addition, the BedGraph values need not be numeric: one can use any text as the BedGraph value and the tool will compare the values from multiple files. + +.. image:: http://people.virginia.edu/~arq5x/files/bedtools-galaxy/ubg.png + + +.. class:: warningmark + +This tool requires that each BedGraph file is reference-sorted (chrom, then start) and contains non-overlapping intervals (within a given file). + + +------ + +**Example input**:: + + # 1.bedgraph + chr1 1000 1500 10 + chr1 2000 2100 20 + + # 2.bedgraph + chr1 900 1600 60 + chr1 1700 2050 50 + + # 3.bedgraph + chr1 1980 2070 80 + chr1 2090 2100 20 + + +------ + +**Examples using the Zero Coverage checkbox** + +Output example (*without* checking "Report regions with zero coverage"):: + + chr1 900 1000 0 60 0 + chr1 1000 1500 10 60 0 + chr1 1500 1600 0 60 0 + chr1 1700 1980 0 50 0 + chr1 1980 2000 0 50 80 + chr1 2000 2050 20 50 80 + chr1 2050 2070 20 0 80 + chr1 2070 2090 20 0 0 + chr1 2090 2100 20 0 20 + + +Output example (*with* checking "Report regions with zero coverage"). The lines marked with (*) are not covered in any input file, but are still reported (The asterisk marking does not appear in the file).:: + + chr1 0 900 0 0 0 (*) + chr1 900 1000 0 60 0 + chr1 1000 1500 10 60 0 + chr1 1500 1600 0 60 0 + chr1 1600 1700 0 0 0 (*) + chr1 1700 1980 0 50 0 + chr1 1980 2000 0 50 80 + chr1 2000 2050 20 50 80 + chr1 2050 2070 20 0 80 + chr1 2070 2090 20 0 0 + chr1 2090 2100 20 0 20 + chr1 2100 247249719 0 0 0 (*) + + +------ + +**Examples adjusting the "Filler value" for no-covered intervals** + +The default value is '0', but you can use any other value. + +Output example with **filler = N/A**:: + + chr1 900 1000 N/A 60 N/A + chr1 1000 1500 10 60 N/A + chr1 1500 1600 N/A 60 N/A + chr1 1600 1700 N/A N/A N/A + chr1 1700 1980 N/A 50 N/A + chr1 1980 2000 N/A 50 80 + chr1 2000 2050 20 50 80 + chr1 2050 2070 20 N/A 80 + chr1 2070 2090 20 N/A N/A + chr1 2090 2100 20 N/A 20 + + +------ + +**Examples using the "sample name" labels**:: + + chrom start end WT-1 WT-2 KO-1 + chr1 900 1000 N/A 60 N/A + chr1 1000 1500 10 60 N/A + chr1 1500 1600 N/A 60 N/A + chr1 1600 1700 N/A N/A N/A + chr1 1700 1980 N/A 50 N/A + chr1 1980 2000 N/A 50 80 + chr1 2000 2050 20 50 80 + chr1 2050 2070 20 N/A 80 + chr1 2070 2090 20 N/A N/A + chr1 2090 2100 20 N/A 20 + + +------ + +**Non-numeric values** + +The input BedGraph files can contain any kind of value in the fourth column, not necessarily a numeric value. + +Input Example:: + + File-1 File-2 + chr1 200 300 Sample1 chr1 100 240 0.75 + chr1 400 450 Sample1 chr1 250 700 0.43 + chr1 530 600 Sample2 + +Output Example:: + + chr1 100 200 0 0.75 + chr1 200 240 Sample1 0.75 + chr1 240 250 Sample1 0 + chr1 250 300 Sample1 0.43 + chr1 300 400 0 0.43 + chr1 400 450 Sample1 0.43 + chr1 450 530 0 0.43 + chr1 530 600 Sample2 0.43 + chr1 600 700 0 0.43 + +@REFERENCES@ + + +