# HG changeset patch # User bernhardlutz # Date 1403109628 14400 # Node ID 75d323631dce5d3d390947caa081414341a4723d # Parent b78d20957e7fa352b5d632ed8e6508e9fb2ac83f Uploaded diff -r b78d20957e7f -r 75d323631dce Bed12ToBed6.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Bed12ToBed6.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,28 @@ + + + + macros.xml + + + + + bed12ToBed16 + -i '$input' + > '$output' + + + + + + + + + + +**What it does** + +bed12ToBed6 is a convenience tool that converts BED features in BED12 (a.k.a. “blocked” BED features such as genes) to discrete BED6 features. For example, in the case of a gene with six exons, bed12ToBed6 would create six separate BED6 features (i.e., one for each exon). + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce BedToBam.xml --- a/BedToBam.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/BedToBam.xml Wed Jun 18 12:40:28 2014 -0400 @@ -6,7 +6,7 @@ - bedToBam + bedtools bedtobam $ubam $bed12 -mapq $mapq @@ -30,5 +30,6 @@ bedToBam converts features in a feature file to BAM format. This is useful as an efficient means of storing large genome annotations in a compact, indexed format for visualization purposes. +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce annotateBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/annotateBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,66 @@ + + + + macros.xml + + + + + bedtools annotate + -i $inputA + -files + #for $bed in $names.beds: + $bed.input + #end for + + #if names.names_select == 'yes': + -names + #for $bed in $names.beds: + $bed.inputName + #end for + #end if + $strand + $counts + $both + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools annotate, well, annotates one BED/VCF/GFF file with the coverage and number of overlaps observed from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature coincides with multiple other feature types with a single command. + +@REFERENCES@ + + + diff -r b78d20957e7f -r 75d323631dce bamToBed.xml --- a/bamToBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/bamToBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -6,7 +6,7 @@ - bamToBed + bedtools bamtobed $option $ed_score -i '$input' diff -r b78d20957e7f -r 75d323631dce bamToFastq.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/bamToFastq.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,32 @@ + + + + macros.xml + + + + + bedtools bamtofastq + $tags + $fq2 + -i '$input' + -fq '$output' + + + + + + + + + + + +**What it does** + +bedtools bamtofastq is a conversion utility for extracting FASTQ records from sequence alignments in BAM format. + +@REFERENCES@ + + + diff -r b78d20957e7f -r 75d323631dce bedpeToBam.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/bedpeToBam.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,39 @@ + + + + macros.xml + + + + + bedtools bedpetobam + $ubam + -mapq $mapq + -i '$input' + -g $genome + > '$output' + + + + + + + + + + + + + + +**What it does** + +Converts feature records to BAM format. + +.. class:: warningmark + +BED files must be at least BED4 to create BAM (needs name field). + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce closestBed.xml --- a/closestBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/closestBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -16,8 +16,8 @@ > $output - - + + @@ -36,6 +36,6 @@ **What it does** Similar to intersectBed, closestBed searches for overlapping features in A and B. In the event that no feature in B overlaps the current feature in A, closestBed will report the closest (that is, least genomic distance from the start or end of A) feature in B. For example, one might want to find which is the closest gene to a significant GWAS polymorphism. Note that closestBed will report an overlapping feature as the closest—that is, it does not restrict to closest non-overlapping feature. - +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce clusterBed.xml --- a/clusterBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/clusterBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -6,14 +6,14 @@ - closestBed + bedtools cluster $strand -d $distance -i $inputA > $output - + @@ -33,5 +33,6 @@ bedtools cluster requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files). +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce complementBed.xml --- a/complementBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/complementBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -12,7 +12,7 @@ > $output - + @@ -26,5 +26,6 @@ .. image:: $PATH_TO_IMAGES/complement-glyph.png +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce coverageBed.xml --- a/coverageBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/coverageBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -19,7 +19,7 @@ > '$output' - + diff -r b78d20957e7f -r 75d323631dce expandBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/expandBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,29 @@ + + + + macros.xml + + + + + bedtools expand + -c $cols + -i $inputA + > $output + + + + + + + + + + +**What it does** + +Replicate lines in a file based on columns of comma-separated values. + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce flankbed.xml --- a/flankbed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/flankbed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -20,28 +20,16 @@ #end if - + - + - - - - - - - - - - - - - + - + @@ -55,5 +43,6 @@ In order to prevent creating intervals that violate chromosome boundaries, bedtools flank requires a genome file defining the length of each chromosome or contig. +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce getfastaBed.xml --- a/getfastaBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/getfastaBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -17,7 +17,7 @@ -fo $output - + @@ -41,5 +41,7 @@ 1. The headers in the input FASTA file must exactly match the chromosome column in the BED file. 2. You can use the UNIX fold command to set the line width of the FASTA output. For example, fold -w 60 will make each line of the FASTA file have at most 60 nucleotides for easy viewing. + +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce groupbyBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/groupbyBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,48 @@ + + + + macros.xml + + + + + bedtools groupby + -c $cols + -g $group + -o $operation + -i $inputA + > $output + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Replicate lines in a file based on columns of comma-separated values. + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce intersectBed.xml --- a/intersectBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/intersectBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -27,7 +27,7 @@ > $output - + diff -r b78d20957e7f -r 75d323631dce jaccardBed.xml --- a/jaccardBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/jaccardBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -17,8 +17,8 @@ > $output - - + + @@ -40,5 +40,7 @@ .. class:: warningmark The jaccard tool requires that your data is pre-sorted by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files). + +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce linksBed.xml --- a/linksBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/linksBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -14,7 +14,7 @@ > $output - + @@ -27,5 +27,7 @@ **What it does** Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a file. This is useful for cases when one wants to manually inspect through a large set of annotations or features. + +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce macros.xml --- a/macros.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/macros.xml Wed Jun 18 12:40:28 2014 -0400 @@ -20,6 +20,18 @@ + + + + + + + + + + + + bedtools @@ -29,6 +41,39 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ------ @@ -38,5 +83,6 @@ .. __: https://github.com/arq5x/bedtools2 .. __: http://cphg.virginia.edu/quinlan/ .. __: http://bioinformatics.oxfordjournals.org/content/26/6/841.short +.. __: http://bedtools.readthedocs.org/en/latest/content/bedtools-suite.html diff -r b78d20957e7f -r 75d323631dce makewindowsBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/makewindowsBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,78 @@ + + + + macros.xml + + + + + bedtools makewindows + #if $type.type_select == 'genome': + -g $type.genome + #else: + -i $type.inputA + #end if + #if $action.action_select == 'windowsize': + -w $action.windowsize + #if $action.step_size.step_size_select == 'yes': + -s $action.step_size.step_size + #end if + #else: + -n $action.number + #end if + $sourcename + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Makes adjacent or sliding windows across a genome or BED file. + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce mapBed.xml --- a/mapBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/mapBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -22,25 +22,18 @@ > $output - - + + - - - - - - - - - - - - + + + + + @@ -70,6 +63,7 @@ The map tool is substantially faster in versions 2.19.0 and later. The plot below demonstrates the increased speed when, for example, counting the number of exome alignments that align to each exon. The bedtools times are compared to the bedops bedmap utility as a point of reference. +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce maskFastaBed.xml --- a/maskFastaBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/maskFastaBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -14,7 +14,7 @@ -fo $output - + @@ -31,5 +31,6 @@ .. image:: $PATH_TO_IMAGES/maskfasta-glyph.png +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce mergeBed.xml --- a/mergeBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/mergeBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -18,7 +18,7 @@ > $output - + - + - - - - - - - + diff -r b78d20957e7f -r 75d323631dce multiCov.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/multiCov.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,57 @@ + + + + macros.xml + + + + + bedtools multicov + -bed $input1 + -bam + #for $bam in $bams: + $bam.input + #end for + $strand + -f $overlap + $reciprocal + $split + -q $mapq + $duplicate + $failed + $proper + > $output + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools multicov, reports the count of alignments from multiple position-sorted and indexed BAM files that overlap intervals in a BED file. Specifically, for each BED interval provided, it reports a separate count of overlapping alignments from each BAM file. + +.. class:: infomark + +bedtools multicov depends upon index BAM files in order to count the number of overlaps in each BAM file. As such, each BAM file should be position sorted (samtool sort aln.bam aln.sort) and indexed (samtools index aln.sort.bam) with either samtools or bamtools. + +@REFERENCES@ + + + diff -r b78d20957e7f -r 75d323631dce nucBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/nucBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,38 @@ + + + + macros.xml + + + + + bedtools nuc + $strand + $seq + $pattern + $case + -fi $fasta + -bed $inputA + > $output + + + + + + + + + + + + + + + +**What it does** + +Profiles the nucleotide content of intervals in a fasta file. + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce overlapBed.xml --- a/overlapBed.xml Thu Jun 05 15:25:18 2014 -0400 +++ b/overlapBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -12,8 +12,8 @@ > $output - - + + @@ -24,5 +24,6 @@ overlap computes the amount of overlap (in the case of positive values) or distance (in the case of negative values) between feature coordinates occurring on the same input line and reports the result at the end of the same line. In this way, it is a useful method for computing custom overlap scores from the output of other BEDTools. +@REFERENCES@ diff -r b78d20957e7f -r 75d323631dce randomBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/randomBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,35 @@ + + + + macros.xml + + + + + bedtools random + -g $genome + -l $length + -n $intervals + #if $seed.choose: + -seed $seed.seed + #end if + + + + + + + + + + + + + +**What it does** + +bedtools random will generate a random set of intervals in BED6 format. One can specify both the number (-n) and the size (-l) of the intervals that should be generated. + +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce reldist.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/reldist.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,37 @@ + + + + macros.xml + + + + + bedtools reldist + -a $inputA + -b $inputB + $detail + + + + + + + + + + + + +**What it does** + +Traditional approaches to summarizing the similarity between two sets of genomic intervals are based upon the number or proportion of intersecting intervals. However, such measures are largely blind to spatial correlations between the two sets where, dpesite consistent spacing or proximity, intersections are rare (for example, enhancers and transcription start sites rarely overlap, yet they are much closer to one another than two sets of random intervals). Favorov et al [1] proposed a relative distance metric that describes distribution of relative distances between each interval in one set nd the two closest intervals in another set (see figure above). If there is no spatial correlation between the two sets, one would expect the relative distances to be uniformaly distributed among the relative distances ranging from 0 to 0.5. If, however, the intervals tend to be much closer than expected by chance, the distribution of observed relative distances would be shifted towards low relative distance values (e.g., the figure below). +.. image:: $PATH_TO_IMAGES/reldist-glyph.png + + +.. image:: $PATH_TO_IMAGES/reldist-plot.png +.. class:: infomark + +@REFERENCES@ + + + diff -r b78d20957e7f -r 75d323631dce shuffleBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/shuffleBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,69 @@ + + + + macros.xml + + + + + bedtools shuffle + -g $genome + -i $inputA + $bedpe + -n $intervals + #if $seed.choose: + -seed $seed.seed + #end if + #if $excl.choose: + -excl $excl.excl + #end if + #if $incl.choose: + -incl $incl.incl + #end if + $chrom + -f $overlap + $chromfirst + $nooverlap + $allowBeyond + -maxTries $maxtries + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools shuffle will randomly permute the genomic locations of a feature file among a genome defined in a genome file. One can also provide an “exclusions” BED/GFF/VCF file that lists regions where you do not want the permuted features to be placed. For example, one might want to prevent features from being placed in known genome gaps. shuffle is useful as a null basis against which to test the significance of associations of one feature with another. +.. image:: $PATH_TO_IMAGES/shuffle-glyph.png +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce slopBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/slopBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,49 @@ + + + + macros.xml + + + + + bedtools slop + $pct + $strand + -g $genome + -i $inputA + #if addition.addition_select = 'b': + -b $b + #else: + -l $l + -r $r + #end if + $header + + > $output + + + + + + + + + + + + + + + + +**What it does** + +bedtools slop will increase the size of each feature in a feature file by a user-defined number of bases. While something like this could be done with an awk '{OFS="\t" print $1,$2-,$3+}', bedtools slop will restrict the resizing to the size of the chromosome (i.e. no start < 0 and no end > chromosome size). +.. image:: $PATH_TO_IMAGES/slop-glyph.png + +.. class:: warningmark + +In order to prevent the extension of intervals beyond chromosome boundaries, bedtools slop requires a genome file defining the length of each chromosome or contig. +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce static/images/reldist-glyph.png Binary file static/images/reldist-glyph.png has changed diff -r b78d20957e7f -r 75d323631dce static/images/reldist-plot.png Binary file static/images/reldist-plot.png has changed diff -r b78d20957e7f -r 75d323631dce static/images/shuffle-glyph.png Binary file static/images/shuffle-glyph.png has changed diff -r b78d20957e7f -r 75d323631dce static/images/slop-glyph.png Binary file static/images/slop-glyph.png has changed diff -r b78d20957e7f -r 75d323631dce static/images/subtract-glyph.png Binary file static/images/subtract-glyph.png has changed diff -r b78d20957e7f -r 75d323631dce static/images/window-glyph.png Binary file static/images/window-glyph.png has changed diff -r b78d20957e7f -r 75d323631dce subtractBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/subtractBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,47 @@ + + + + macros.xml + + + + + bedtools subtract + $strand + -a $inputA + -b $inputB + -f $overlap + $removeIfOverlap + > $output + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +bedtools subtract searches for features in B that overlap A. If an overlapping feature is found in B, the overlapping portion is removed from A and the remaining portion of A is reported. If a feature in B overlaps all of a feature in A, the A feature will not be reported. + +.. image:: $PATH_TO_IMAGES/subtract-glyph.png +@REFERENCES@ + + diff -r b78d20957e7f -r 75d323631dce tagBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tagBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,52 @@ + + + + macros.xml + + + + + bedtools tag + -i $inputA + -files + #for $bed in beds: + $bed.input + #end for + -f $overlap + $strand + -tag $tag + $field + > $output + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Annotates a BAM file based on overlaps with multiple BED/GFF/VCF files on the intervals in an input bam file + +@REFERENCES@ + + + diff -r b78d20957e7f -r 75d323631dce test-data/A.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/A.bed Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,5 @@ +chr1 100 200 +chr1 180 250 +chr1 250 500 +chr1 501 1000 + diff -r b78d20957e7f -r 75d323631dce test-data/expandInput.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/expandInput.bed Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,2 @@ +chr1 10 20 1,2,3 10,20,30 +chr1 40 50 4,5,6 40,50,60 diff -r b78d20957e7f -r 75d323631dce test-data/groupbyinput.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/groupbyinput.bed Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,14 @@ +chr21 9719758 9729320 variant1 chr21 9719768 9721892 ALR/Alpha 1004 + +chr21 9719758 9729320 variant1 chr21 9721905 9725582 ALR/Alpha 1010 + +chr21 9719758 9729320 variant1 chr21 9725582 9725977 L1PA3 3288 + +chr21 9719758 9729320 variant1 chr21 9726021 9729309 ALR/Alpha 1051 + +chr21 9729310 9757478 variant2 chr21 9729320 9729809 L1PA3 3897 - +chr21 9729310 9757478 variant2 chr21 9729809 9730866 L1P1 8367 + +chr21 9729310 9757478 variant2 chr21 9730866 9734026 ALR/Alpha 1036 - +chr21 9729310 9757478 variant2 chr21 9734037 9757471 ALR/Alpha 1182 - +chr21 9795588 9796685 variant3 chr21 9795589 9795713 (GAATG)n 308 + +chr21 9795588 9796685 variant3 chr21 9795736 9795894 (GAATG)n 683 + +chr21 9795588 9796685 variant3 chr21 9795911 9796007 (GAATG)n 345 + +chr21 9795588 9796685 variant3 chr21 9796028 9796187 (GAATG)n 756 + +chr21 9795588 9796685 variant3 chr21 9796202 9796615 (GAATG)n 891 + +chr21 9795588 9796685 variant3 chr21 9796637 9796824 (GAATG)n 621 + diff -r b78d20957e7f -r 75d323631dce test-data/mygenome.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/mygenome.bed Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,2 @@ +chr1 1000 +chr2 800 diff -r b78d20957e7f -r 75d323631dce windowBed.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/windowBed.xml Wed Jun 18 12:40:28 2014 -0400 @@ -0,0 +1,73 @@ + + + + macros.xml + + + + + bedtools window + #if $inputA.ext == "bam": + -abam $inputA + #else: + -a $inputA + #end if + -b $inputB + $ubam + $bed + $strandB + #if addition.addition_select = 'b': + -b $b + #elif addition.addition_select = 'lr': + -l $l + -r $r + #end if + $original + $number + $nooverlaps + $header + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Similar to bedtools intersect, window searches for overlapping features in A and B. However, window adds a specified number (1000, by default) of base pairs upstream and downstream of each feature in A. In effect, this allows features in B that are “near” features in A to be detected. + +.. image:: $PATH_TO_IMAGES/window-glyph.png +@REFERENCES@ + +